Best 

Available 

Copy 


TEC-0046 


AD-A283  257 


Representation, 
Modeling  and 
Recognition  of 
Outdoor  Scenes 
First  Annual  Report 


DTiC 

^ELECTE 

AUG  0  8  1994 


US  Army  Corps 
of  Engineers 

Topographic 
Engineering  Center 


Martin  A.  Fischler 
Robert  C.  Bolles 


SRI  International 

333  Ravenswood  Avenue 

Menlo  Park,  CA  94025-3493 


November  1993 


Approved  for  public  release;  distribution  is  unlimited. 

94  8  05 


04 


Prepared  for: 

Advanced  Research  Projects  Agency 
3701  North  Fairfax  Drive 
Arlington,  VA  22203-1714 

DTIG  QUALITY  INSPECTED  J 


Monitored  by: 

U.S.  Army  Corps  of  Engineers 
Topographic  Engineering  Center 
7701  Telegraph  Road 
Alexandria,  Virginia  22315-3864 


<\- 


34-24867 

■lilliili 


Destroy  this  report  when  no  loiter  needed. 
Do  not  return  it  to  the  originator. 


The  findings  in  this  report  are  not  to  be  construed  as  an  official  D^artment  of  the  Army 
position  unless  so  designated  by  other  authorized  documents. 


The  citation  in  this  report  of  trade  names  of  commercially  arailable  products  does  not 
constitute  oflldai  endorsement  or  approvai  of  the  use  of  such  products. 


t 


REPORT  DOCUMENTATION  PAGE 


form  Approvtd 
0M»  No.  0  704-01 M 


htmt  ><eon«4  Aurtm  'o'  <aut«lion  v  imoimMian  t  niim«i*a  lo  '  xouf  e*«  'noom*.  iiKiuoinq  th*  tun*  «o>  rtvicwinq  inttrucuom  wocnina  ini*r«i 

4«tM«M«9  «n«  mtMiuinMif  in*  «*M  n*iWt.  «n«  cawaminq  «n«  r*><*n,.n4  in*  coti*cl<on  o»  mtofmaiion  Wn*  coinm*nn  iMtrainq  inn  mod**  *Uiin*l*  or  tn*  oinw 
lOOtnion •« MtormMMn. mriuOuif  tMMntwnt for  r**wino inn Ouia*n.  lo OrMninqion •<*«o«w«ri*n l*fvK*i. Oi>*aorM* for  inlormaiion Ootranom «no ll*oora  tl 
OOMMi«n«ia«. iwItlNO. linj-4Ni.*nOtetn*OMK*o<M*n«o*<n*r>l*no*u09*«.n*a*nror>K*Og<tionfroi*<ttO}04-Oini.WMnin9ien.  OC  HlOl 


1.  AOfNCV  USI  ONLY  (Lttvo  blsnkt 


4.  TITLf  ANO  SUITini 


i.  RIPORT  OATf 

November  1993 


3.  RCPORT  TVPI  ANO  UATES  COVERCO 

First  Annual  Rq;)ort  Mar.  1992  -  Mar.  1993 


S.  FUNOMG  NUMBERS 


R^resentadon,  Modeling  and  Recognition  of  Outdoor  Scenec 


a.  AUTHORS) 

Martin  A.  Fischler  Robert  C.  Holies 


DACA76-92-C-0008 


7.  PEBPORNHNG  ORGANUATION  NAME(S)  ANO  AOORE$S(CS) 

SRI  International 

333  Ravenswood  Avenue 

Menlo  Park,  CA  94025-3493 


iPQNSORINfi^ONITQRING  AGENCY  NAME(S)  ANO  AOORESS(CS} 

AdvanceaResearch  Projects  Agoicy 
3701  North  Fairfax  Drive,  Arlington,  VA  22203-1714 

U.S.  Army  Topographic  Engineering  Center 
7701  Telegn^h  Road.,  Alexandria,  VA  22315-3864 


a.  PERFORMING  ORGANIZATIOM 
REPORT  NUMBER 


10.  SPONSORING  /  MONITORING 
AGENCY  REPORT  NUMBER 


TEC-0046 


12«.  OISTRIBUTION/AVARAaiUTY  STATEMENT 


12b.  DISTRIBUTION  CODE 


Approved  for  public  release;  distribution  is  unlimited. 


13.  ABSTRACT  (MsMimumJOOwotrM 

The  primary  goal  of  this  project  was  to  advance  the  state-of-the-art  in  scene  interpretation  for  autonomous 
systems  that  operate  in  natu^  terrain.  In  particular,  techniques  are  being  developed  for  represrating 
knowledge  about  complex  cultural  and  natural  ravironments  so  that  a  conqiuter  vision  system  can  successfully 
plan,  navigate,  recognize  and  manipulate  objects  and  answ^  questions  or  make  decisions  rdevant  to  this 
knowledge.  The  initial  results  are  centered  on  the  development  of  rqiresentations  and  associated  m^ods  for 
rapidly  modeling  natural  terrain  (from  image  sequences)  at  a  level  of  organization  higher  than  that  of  the 
conventional  dense  array  of  dept^.  This  work  will  provide  the  essential  advance  needed  to  turn  raw  geometric 
measurements  into  timely  information  usable  by  robotic  navigation  and  planning  systems.  Work  is  also 
progressing  on  two  additional  problems:  modding  compact  3-D  objects  from  their  projected  2-D  contours,  and 
the  problem  of  recognizing  important  classes  of  natural  and  man-made  objects  -  especially  roads,  trees  and 
rocks. 


14.  SUBJECT  TERMS  ^  ,  . 

Machine  Vision,  Automated  Scene  Analysis,  Interactive  Scene  Analysis, 
Object  Recognition,  Terrain  Modeling,  Automated  Cartography,  Feature 
Extraction,  Delineation,  Partitioning,  Geometric  Modding 


17.  SECURITY  CLASSIFKATION  I  IB.  SECURITY  CLASSIFICATION 


SECURITY  CLASSIFICATION 

OF  REFORT 


[iRrBflWoii3ja»i 


SECURITY  CLASSIFICATION 
OF  THIS  FAGE 


[WMlf-VVlISIUF] 


IF.  SECURITY  CUSSIFKATION 
OF  ABSTRACT 


NSN  7S40-01-280-SSOO 


IS.  NUMBER  OF  FAGES 

71 


14.  PRICE  CODE 


20.  LIMITATION  OF  ABSTRACT 

UNLIMITED 


Standard  Form  298  <R«v.  2-89) 

''•<cfib*d  e*  ANSI  tw-it 


PREFACE 


Tliis  researdi  is  sponsored  by  the  Advanced  Research  Projects  Agency  (ARPA)  and  monitored 
by  the  U.S.  Army  Topographic  Engineering  Center  (TEC)  under  Contract  DACA76-92'C-(XX)8,  titled 
"Represottation,  Nfodding  and  Recognition  of  Outdoor  Scenes,  First  Annual  R^rt".  Hie  ARPA 
Program  Manager  is  Dr.  Oscar  Firschein,  and  the  TEC  Contracting  Officer’s  Rqiresentative  is  Ms. 
Lauretta  Williams. 


iii 


REPRESENTATION.  MODELING  AND  RECOGNITION  OF  OUTDOOR  SCENES 


Martin  A.  Fischler  and  Robert  C.  Bolles 
Principal  Investigators 


OBJECTIVE: 

Our  prinaru  goal  in  this  project  is  to  advance  the  state  of  the  art  in  scene 
interpretation  for  autononous  systems  that  operate  in  natural  terrain.  In 
particular,  techniques  are  being  developed  for  representihg  knouledge  about 
complex  cultural  and  natural  environments  so  that  a  computer  vision  system  can 
successfully  plan,  navigate,  recognize,  and  manipulate  objects  and  ansuer 
questions  or  make  decisions  relevant  to  this  knouledge. 


APPROACH: 

This  work  integrates  advances  in  four  separate  technologies  to  achieve  the 
goal  of  providing  a  foundation  for  the  design  of  highly  competent  machine  vision 
systems  capable  of  autonomous  operation  in  the  outdoor  world. 

First,  stored  knowledge  (such  as  map  data  and  object  models)  provides  the  basis 
for  invoking  context,  function,  and  purpose,  in  addition  to  the  use  of  visually 
observed  geometric  shape,  to  recognize  scene  objects. 

Second,  we  are  developing  compact  and  expressive  representations  for  modeling, 
and  ultimately  recognizing,  objects  encountered  in  the  natural  world. 
Computational  efficiency,  and  thus  real  time  performance,  is  critically 
dependent  on  using  effective  representations  for  both  models  and  sensed  data. 

Third,  global  optimization  techniques  are  being  developed  that  require  reasonable 
amounts  of  computation,  but  which  are  expected  to  produce  results  beyond  those 
obtainable  by  local  analysis  methods. 

Fourth,  techniques  are  being  developed  that  are  able  to  simultaneously,  or 
incrementally,  exploit  multiple  views  of  a  scene  in  compiling  a  complete  scene 
model.  For  example,  in  our  previous  work  we  have  been  able  to  demonstrate  that 
the  integrated  analysis  of  a  motion  sequence  can  be  used  to  construct  a 
geometric  scene  model  that  is  superior  to  a  sequence  of  independent  stereo 
reconstruct i ons. 


PROGRESS: 

This  program  builds  on  our  previous  ARPA  research.  Our  initial  results  are 
centered  on  the  development  of  representations  and  associated  methods  for 
rapidly  modeling  natural  terrain  (from  image  sequences)  at  a  level  of 
organization  higher  than  that  of  the  conventional  dense  array  of  depths.  This 
work  will  provide  the  essential  advance  needed  to  turn  raw  geometric 
measurements  into  timely  information  usable  by  robotic  navigation  and  planning 
systems.  Uork  is  also  progressing  on  two  additional  problems:  modeling  compact 
3-D  objects  from  their  projected  2-0  contours,  and  the  problem  of  recognizing 
important  classes  of  natural  and  man-made  objects  —  especially  roads,  trees, 
and  rocks. 

SUMMARY  OF  RECENT  ACCOMPLISHMENTS: 

-Developed  an  approach  for  integration  of  information  acquired  from  multiple 
views  of  a  scene  into  a  description  of  scene  geometry.  The  approach  uses  a  new 
class  of  geometric  primitives  which  allows  easy  expression  of  known  constraints 
and  observed  data,  and  also  allows  the  use  of  practical  optimization  based 
solution  techniques.  This  uork  will  provide  an  effective  way  of  allowing  a 
robotic  system  to  incrementally  build  a  progressively  more  accurate  and  complete 
model  of  the  environment  in  which  it  is  operating.  A  paper  describing  this 
work,  intended  for  journal  publication,  has  been  completed  and  is  included  in 
this  report  as  Appendix  A.;  also  see  the  detailed  discussion  of  this  topic 
provided  in  a  following  section. 

-Made  a  significant  new  advance  in  the  long-standing  problem  of  duplicating 
human  performance  in  recovering  3-D  models  of  terrain  and  man-made  objects  from 
qualitative  and  imprecise  line  drawings  (e.g..  of  terrain  elevations  as  in  an 


approxiMte  and  uncalibrated  contour  eap.  or  building  edges  as  in  a  single 
approxiaate  projection  of  the  corresponding  uire-fraae).  This  uork  can  greatly 
eiaplify  coaaunication  problees  betueen  ean  and  Machine  in  such  applications  as 
robotic  Mission  planning  and  in  construction  of  databases  for  use  in  robotic 
navigation.  A  paper  describing  this  uork  has  been  published  in  the  International 
Journal  of  CoMputer  Vision  ("An  optiaization  based  approach  to  the 
interpretation  of  single  line  drauings  as  3-0  uire  frames,"  IXV  9(2) :  113-13G, 
Nov  1992);  a  reprint  is  enclosed  as  Appendix  B.  On-going  uork  has  led  to  (neu) 
additional  results  of  both  theoretical  and  practical  importance;  these  neu 
results  ui 1 1  be  described  in  a  later  report. 

-The  problem  of  automatically  recognizing  objects  appearing  in  images  of  the 
outdoor  uorld  has  proven  to  be  extremely  difficult,  in  part,  because  in  addition 
to  all  the  other  difficulties  of  object  recognition,  ue  must  nou  also  contend 
uith  the  lack  of  explicit  shape  models.  UhiTe  most  of  the  current  (successful) 
coMputer -based  recognition  approaches  rely  on  explicit  knouledge  of  shape, 
rocks,  trees,  and  other  natural  objects  cannot  be  successfully  described  in  this 
uay;  even  such  generic  man-made  objects  as  roads,  bridges,  and  buildings  are 
more  likely  to  satisfy  functional  constraints  rather  than  being  exemplars  of 
some  geoMetric  blueprint.  In  order  to  replace  explicit  shape  uith  a  more 

reneral  uay  of  describing  natural  objects  (and  complex  man-made  structures),  a 
arge  number  of  geometric  primitives  have  been  proposed  that  are  also  suitable 
for  detection  by  automatic  image  analysis  algorithms  (e.g.,  edges,  textures, 
fractals).  The  result  of  much  of  this  past  uork  is  that,  uhi le  often  promising, 
the  techniques  are  not  sufficiently  reliable  to  provide  a  basis  for  the 
knou I  edge-based  analysis  needed  to  complete  the  recognition  task.  Uhat  is 
required  are  a  feu  techniques  than  can  very  reliably  organize  the  pixel-level 
image  data  as  a  basis  for  higher  level  analysis.  Finding  the  appropriate 
combination  of  lou-level  data-descr ipt ion.  and  associated  extraction  techniques, 
is  thus  a  key  problem  in  machine  vision  and  of  our  primary  concerns  in  this 
project.  In  addition  to  our  uork  relevant  to  this  topic  discussed  above,  ue  have 
focused  on  extracting  coherent  line  (as  distinct  from  edge)  features  in  single 
gray-level  images.  Ue  note  that  a  line  sketch  of  some  object  or  scene  is  often 
sufficient  to  depict  the  imaged  information  in  a  very  compact  uay.  Tuo 
techniques  have  emerged  from  this  uork  that  appear  to  meet  the  criterion  of 
generality  and  robustness.  The  first  is  a  generic  uay  to  find  candidate  line 
structure  in  an  image;  this  uork  ui I  I  be  described  in  a  later  report.  The 
second  is  a  uay  to  organize  such  data  into  perceptually  coherent  and 
semantically  meaninyful  units.  In  Appendix  C  of  this  report  ue  describe  our 
progress  in  the  design  of  a  curve  partitioning  technique  that  is  extremely 
robust  in  achieving  the  perceptual  organization  task;  ue  also  describe  hou  this 
technique  can  be  applied  to  the  problem  of  road  delineation  in  aerial  images. 


DETAILED  DISCUSSION  OF  RECENT  UORK  ON  GEOHETRIC 

RECONSTRUCTION  FROM  IIULTIPLE  VIEUS: 

To  reconstruct  object  surfaces,  one  can  start  uith  a  number  of  measuring 
techniques,  for  example  laser  rangefinding,  stereo  or  3D  scanners,  all  of  uhich 
provide  rau  information  about  the  location  of  points  in  space.  These  points, 
houever,  often  form  potentially  noisy  "clouds"  of  data  instead  of  the  surfaces 
one  expects. 

Deriving  the  surfaces  from  such  data  is  a  difficult  task  because: 

-the  3D  points  may  form  a  very  irregular  sampling  of  the  space, 

-they  may  have  been  produced  by  several  sensors  or  derived  from  several 
vieupoints  so  that  it  becomes  impossible  to  uork  only  in  the 
imaging  plane  of  any  one  sensor, 

-several  surfaces  can  overlap;  simple  interpolation  ui I  I  not  uork, 

-the  sensors  and  algorithms  make  mistakes  that  must  be  properly  dealt  uith. 


In  this  research  effort,  ue  address  the  problem  of  determining  the  3-D  shape  and 
material  properties  of  surfaces  by  combining  the  information  provided  by  active 
or  passive  ranging  techniques  uith  that  present  in  multiple  2-D  intensity 
images.  As  discussed  in  our  previous  reports,  ue  are  investigating  tuo  different 
approaches,  the  first  based  on  local  surfaces  and  the  second  on  global  ones. 

At  present,  most  of  our  efforts  have  been  devoted  to  the  global  surface 
approach.  It  relies  on  hexagonal  tr i angulat ions  that  can  be  deformed  to  recover 
both  the  geometry  and  physical  properties  of  surfaces  of  interest. 


Surface  Geoeetry 

van  caaera  Models  for  the  iaages  being  analyzed,  the  corresponding  projections 
’  the  3-0  surface  points  appearing  in  the  ieages  can  be  conputed  and,  assuming 
la  usual  stereo  assueption.  Must  nave  comparable  grey  levels.  Our  algorithm 
itiMizes  the  placeaent  of  surface  vertices  to  minimize  the  overall  difference 
I  grey  levels  uhi ie  preserving  surface  smoothness.  The  actual  criterion  ue  use 
I  a  tine^  combination  of  the  sums  of  the  variance  of  grey  levels  across  images 
id  of  the  SUMS  of  the  surface  curvatures  at  the  vertices.  Ue  use  a 
mjugate-gradi ent  descent  algorithm  eMaedded  in  a  continuation  method  to 
irrora  the  optiaization:  ue  first  optimize  uith  a  strong  smoothness 
mstraint;  ue  then  reduce  the  constraint  progressively. 

icause  our  surfaces  are  3-D  objects,  ue  can  directly  determine  the  presence  of 
iddan  surfaces  and  deal  effectively  uith  occlusions.  In  order  to  detect  those 
idden  surfaces  in  an  effective  manner,  ue  have  implemented  the  algorithm  to  run 
1  an  SGI  machine  and  exploit  the  machines  z-buffering  capabilities. 

I  far,  in  most  of  our  experiments,  ue  have  used  regular  grids  and  uniform 
Roothness  constraints.  Uhi le  this  is  appropriate  for  surfaces  uhose  properties 
taain  relatively  constant,  this  is  suboptimal  for  more  complex  surfaces  that 
in  be  more  effectively  handled  using  triangulated  irregular  netuorks.  The 
slatively  smooth  parts  of  such  surfaces  should  be  represented  by  large  patches 
li  le  the  rougher  parts  are  better  described  by  finer  and  less  constrained 
‘i angulations.  Ue  have  made  progress  in  implementing  such  irregular  netuorks 
y  alTouing  some  of  the  regular  facets  to  be  subdivided  as  required  by  the 
jrface  geometry. 

I  Physical  Properties 

any  natural  surfaces  can  be  modeled  by  a  Lambertian  reflectance  model  uhose 
Ibedo  depends  on  the  corresponding  physical  surface  properties.  Recovering 
his  albedo  is  therefore  an  important  first  step  touards  the  goal  of  analyzing 
hose  physical  properties  and  potentially  segmenting  regions  of  interest, 
nlike  traditional  "shape  from  shading"  approaches  that  uork  in  image  space  and 
ssuae  constant  albedo,  our  technique  allous  us  to  assign  different  albedoes  to 
he  facets  of  the  derived  tri angulation.  Ue  can  then  optimize  the  values 
ssigned  to  these  albedoes  and  also  find  (or  use  the  knoun)  location  of  the 
i{^t  source  to  maximize  the  similarity  betueen  the  shaded  image  derived  from  our 
ode  Is  and  the  real  images. 

e  are  performing  experiments  uith  the  above  method  for  computing  albedo  given 
urfaces  originally  derived  using  stereo.  The  objective  function  ue  optimize 
nforces  albedo  smoothness  uhi le  minimizing  intensity  difference  betueen  the 
haded  images  and  the  real  ones.  To  make  this  approach  fully  general,  ue  ui  1 1 
ntroduce  albedo  discontinuities  to  account  for  abrupt  changes  in  surface 
aterial  type.  Ue  uill  also  attempt  to  determine  those  classes  of  natural 
bjects  and  terrain  types  for  uhich  the  Lambertian  model  is  appropriate  by 
xMining  the  variance  in  intensity  across  images  of  the  same  scene  acquired 
rom  different  vieupoints. 


ur  ultimate  goal  in  the  above  tuo  tasks  is  to  be  able  to  optimize 
imultaneously  the  vertex  positions  and  the  surface  albedoes  in  order  to  compute 
urface  geometry  and  photometry.  Our  current  focus  in  this  task  is  to  combine 
he  stereo  objective  function  Uith  the  photometric  one  in  order  to  achieve  a 
ore  complete  description  of  the  scene. 

)  Implementation  and  Testing 

n  the  past  feu  months  ue  have  refined  and  tested  our  method  for  reconstructing 
oth  the  shape  and  reflectance  properties  of  physical  surfaces  from  the 
nformation  present  in  multiple  images.  Ue  have,  so  far,  considered  tuo  classes 
f  information.  The  first  class  contains  the  information  that  can  be  extracted 
rom  a  single  image,  such  as  texture  gradients,  shading,  and  occlusion  edges, 
e  take  advantage  of  the  fact  that  multiple  images  enhance  the  utility  of  this 
ype  of  information  by  allouing  for  consistency  checks  across  the  images  as  uell 
s  the  use  of  averaging  to  improve  precision.  The  second  class  contains 
nformation  that  require  at  least  tuo  images  for  its  extraction,  such  as  the 
iepth  of  corresponding  points  found  in  tuo  input  images  through  the  use  of 
tereo  tri angulation. 

ur  surface  reconstruction  method  uses  an  object-centered  representation, 
pecifically,  a  hexagonal  I y-connec ted  3-0  mesh  of  vertices  uith  triangular 


facets.  Such  a  representation  accoMsodates  the  tuo  classes  of  infornation 
Mentioned  above,  as  uell  as  wjitiple  iaages  (including  Motion  sequences  of  a 
rigid  object)  and  self-occlusions.  Ue  have  chosen  to  Model  the  surface  Material 
using  the  Laisbertian  reflectance  Model  uith  variable  albedo,  though 
generalizations  to  specular  surfaces  are  possible.  Consequently,  the  natural 
choice  for  the  Monocular  inforaation  source  is  shading,  uhi le  intensity  is  the 
natural  choice  for  the  iaage  feature  used  in  Multi-iMage  correspondence.  Not 
only  are  these  the  natural  choices  uhen  ue  are  able  to  assume  a  Lambertian 
reflectance  model,  theg  are  coapleaentarg:  intensity  correlation  is  most 
accurate  uherever  the  input  iaages  are  highly  textured,  and  shading  is  most 
accurate  uhen  the  input  iaages  have  smooth  intensity  variation.  Since  ue  uish 
to  deal  uith  surfaces  uith  non-uni  fora  albedo,  ue  have  developed  a  neu  approach 
to  incorporating  shading  information  that  uses  the  variation  in  computed  albedo 
from  facet  to  facet  as  the  indicator  of  a  correct  surface  reconstruction. 

Ue  use  an  optimization  approach  to  reconstruct  the  surface  shape  and  its 
material  properties  from  the  input  images.  That  is.  ue  alter  the  shape  and 
reflectance  properties  of  the  surface  mesh  so  as  to  minimize  an  objective 
function,  given  an  initial  surface  estimate  provided  by  other  means,  such  as  a 
standard  stereo  algorithm.  The  objective  function  is  a  linear  combination  of 
an  intensity  correlation  component,  an  albedo  variation  component,  and  a  surface 
smoothness  component.  The  first  tuo  components  are  a  function  of  the 
intensities  projected  onto  the  triangular  facets  from  the  input  images  (taking 
occlusions  into  account),  and  are  ueighted  according  to  the  amount  of  texture  in 
the  intensities,  for  the  reasons  mentioned  in  the  previous  paragraph.  The 
geometric  smoothness  component  is  sloulg  decreased  during  the  optimization 
process  to  allou  for  an  accurate  estimate  of  the  surface  shape  and  reflectance. 

Ue  have  implemented  an  algorithm  eagloging  these  three  terms  and  have  performed 
extensive  experiments  using  synthetic  images  as  uell  as  real  aerial  and  face 
images.  The  strengths  of  the  approach  include: 

—  The  use  of  the  3-0  surface  mesh  al lous  us  to  deal  uith  se I f-occ lusi ons  and  thus 
effectively  merge  information  from  several  potentially  very  different  vieupoints 
to  eliminate  "blind-spots." 

—  By  combining  stereo  and  shape  from  shading,  and  ueighing  appropriately  the 
reliability  of  their  respective  contributions,  ue  can  obtain  results  that  are 
better  than  those  produced  by  either  technique  alone. 

—  Using  the  facets  to  perform  the  stereo  computation  frees  us  from  the 
constant-depth  assumption  that  standard  correlation-based  stereo  techniques 
make.  It  becomes  possible  to  recover  accurately  the  depth  of  sharply  sloping 
surfaces  (such  as  that  of  a  sharp  ridge). 

—  The  shape  from  shading  component  does  not  make  a  constant-albedo  assumption 
unlike  most  shading  algorithms.  Instead,  ue  only  make  the  ueaker  and  much  more 
general  assumption  that  albedoes  vary  slouly  across  textureless  areas. 


Appendix  A: 


An  Optimization-Based  Approach 
to  the  Interpretation  of  Single  Line 
Drawings  as  3D  Wire  Frames 


lalenMiioiial  Jouraal  of  CoiqiuiBr  Vi^,  9:2,  113-136  (1992) 
®  1992  Khiwer  Academic  Publialien,  Mamihcmred  in  Tbc  Netfaeriandt. 


An  Optimization-Based  Approach  to  the  Interpretation  of  Single  Line 
Drawings  as  3D  Frames 

WAN  G.  LECLERC  AND  MARTIN  A.  FISCHLER 

Arttfickil  hMlUgence  Center,  SRI  International,  333  Ravenswood  Aife. ,  Menlo  Park,  CA  94025 
Received 

Abetnct 

Line  drawings  provide  an  effective  means  of  communicadrai  about  the  geometry  of  3D  objects.  An  understanding 
of  how  to  dtqilicate  the  way  humans  interpret  line  drawings  is  extremely  important  in  enabling  man-machine  cmn- 
munication  with  respect  to  images,  diagrams,  and  qMdal  constructs.  In  particular,  such  an  understanding  could 
be  used  to  provide  the  human  with  the  aqiability  to  create  a  line-drawing  sketch  ai  a  polyhedral  object  that  the 
machine  can  automatically  convert  into  tte  intended  3D  model. 

A  recently  published  paper  (Marill  1991)  fnesented  a  simple  optimization  procedure  supposedly  able  to  duplicate 
human  jut^moit  in  recovering  die  3D  ‘‘wire  frame”  geometry  oi  (dijects  dqiicted  in  line  drawings.  Marill  pro¬ 
vided  some  inqiressive  examples,  but  no  theoretical  justification  for  his  approach.  Here,  we  introduce  our  own 
work  by  first  critically  examining  Marill’s  algorithm.  Wb  i»ovide  an  eiqilanation  for  why  Marill’s  algorithm  was 
able  to  perform  as  well  as  it  did  on  the  examples  he  prraoited,  discuss  its  weaknesses,  and  show  very  simple 
exan^les  where  it  fiuls.  We  then  provide  an  a^ridim  duit  in^iraves  on  Marill’s  results.  In  particular,  we  show 
duu  an  effective  objective  frmction  must  fevor  bodi  ^nunetry  and  planarity— Marill  deals  only  with  the  symmetry 
issue.  By  modifying  Marill’s  objective  function  to  explicidy  fevor  planar-feced  solutions,  and  by  using  a  more 
conqielent  optimization  technique,  we  were  able  to  demonstrate  sig^candy  improved  performance  in  all  of  die 
exanqiles  Marill  provi^  and  diose  additional  ones  we  constructed  ourselves.  Finally,  we  examine  some  questions 
relevant  to  the  inqilications  of  this  work  for  understanding  the  hunum  ability  to  interpret  line  drawings. 


1  Introdncthm 

The  interpretation  of  line  drawings  has  been  an  inqxir- 
tant  focus  for  research  in  machine  vision  since  the 
field’s  incqition.  There  seems  to  be  litde  question  that 
human  subjects  can  easily  recover  3D  models  from  2D 
line  drawings  dqiicting  many  classes  of  objects.  One 
such  class  of  qiecial  intnest  bns  bem  called  die  “blocks 
world.”  This  class  consists  primarily  of  polyhedral 
solids  in  3D  Euclidean  space  and  die  projections  of  die 
visible  edges  of  diese  objects  onto  a  2D  plane  (which 
we  caU  the  line  drawing).  Given  a  sii^e  line  drawing 
of  a  blocks  world  scene,  normal  human  subjects  will 
usually  arrive  at  the  same  3D  interpretation,  even 
though  there  may  be  a  very  large  number  of  possible 
3D  objects  diat  could  have  prodiKed  the  given  drawing. 

B^iiming  widi  die  work  of  Guzman  in  1968,  there 
has  beoi  a  concerted  effort  by  vision  researchers  to 


develop  an  algorithmic  procedure  that  could  diqilicate 
human  performance  in  interpreting  line  drawi^,  at 
least  with  respect  to  blocks  world  objects.  A  signifi¬ 
cant  body  of  work  in  this  area  was  produced  by  such 
prominmit  scientists  as  Clowes  (1971),  Hufhnan  (1971), 
^tz  (1972),  Mackworth  (1973),  Kanade  (1980),  Drqier 
(1981),  and  Sugihara  (1982, 1984).  However,  the  pr^ 
Ion  as  origirudly  formulated,  devising  a  procedure  for 
recovering  p^hologically  plausible  3D  models  from 
line  drawings,  remains  unsolved.  (A  p^chologically 
plausible  recrmstruction  of  a  line  drawing  is  the  one 
diat  virtually  all  pet^le  will  accept.) 

The  earliest  work  by  Guzman  was  heuristic  in 
nature,  feiled  in  many  cases  v^ieie  humans  had  no  trou¬ 
ble  in  finding  afqirqmate  interpretations,  and  did  not 
actually  return  a  3D  model,  but  rather  partitioned  the 
scene  into  sqiarate  polyhedral  objects.  Clowes,  Huff¬ 
man,  ^feltz,  Mackworth,  and  Kanade  formalized  and 


114  Leclerc  and  FbcUer 


the  wc^  of  Guzniao,  but  did  not  strive  the 
Dcigiiial  pRri>km.  They  were  (u«iaUy)  alrie  to  label  the 
edges  of  die  line  dnwing  to  correctly  reflect  a  consis¬ 
tent  3D  interpretation  if  tnie  existed,  or  could  assert 
duri  the  drawing  did  not  corre^iond  to  a  realizable 
Iriocks  wc^  scene.  Mackworth  ^  Kanade  eaqilicitly 
Bsqritriled  die  planarity  of  the  frees  of  Iriocks  wtvld  and 
“Origami”  objects  emirioying  a  “gradient  space” 
represejitatiraa)  to  aoconyliA  a  form  of  semiquandtative 
recovery.  In  addition  to  ctmsistent  edge  labeling,  diey 
could  ato  constrain  the  relative  orientation  of  the  frees 
oi  the  target  3D  model.  The  labels  could  describe  the 
edges  as  being  cemvex,  concave,  occluding,  and  so 
imtfa,  but  still,  for  the  general  case,  no  explicit  3D 
model  was  returned  (without  introducing  additional 
constraints)  and  the  algoridims  would  make  occasional 
errors.* 

In  a  series  of  {xqiers,  Sugihara  reformulated  the 
realizability  and  recovery  pitriilems  for  line  drawings 
(tf  polyhecta  (both  with  and  widiout  hidden  lines  le- 
mc^)  in  purely  algebraic  terms.  He  required  as  in¬ 
put  a  specification  of  the  vertexes  defining  each  of  die 
individual  planar  frees  of  die  polyhedra,  and  also  re¬ 
quited  that  die  implied  line  drawing  be  a  general- 
position  prqjectkm  die  pcriyhedia.  Wtti  this  iqiproach 
he  succeeded  in  {Koviding  an  algebraic  criterion  as  a 
necessary  and  sufficient  conditkm  for  a  line  drawing 
lo  leinesent  a  {riiysically  realizable  polyhedral  trisect. 
He  could  also  constrain  the  space  of  feasible  solutions, 
and  obtain  a  unique  solution  if  enough  additional  con¬ 
straints  were  provided.  These  additional  constraints 
were  obtained  fitnn  information  beyond  that  provided 
by  die  line  drawing  (e.g.,  shading  or  texture  informa¬ 
tion).  Sugihara’s  work  was  an  important  advance,  but 
again  it  fell  short  of  the  original  goal.  It  will  rarely  be 
the  case  that  a  unique  reconstruction  is  inqilied  by  die 
line  drawing,  and  dius  die  primary  objective  of  duplicat¬ 
ing  human  performance  in  this  regard  is  not  met.^ 

Our  motivatum  for  writing  this  article  was  stqiplied, 
in  part,  by  a  recent  publication  authored  by  T.  Marill 
PI9^).  He  refocused  on  the  original  problem  of  human 
interinetation  of  single  line  drawing  as  3D  structures; 
lie  did  not  restrict  his  universe  to  blocks  world  objects 
nor  did  he  demand  diat  the  line  drawii^  be  complete, 
rhe  surprising  thii^  about  his  work  was  diat  he  used 
in  optimization  qiproach  involving  (seoningly)  an 
dmost  trivial  objective  function,  and  the  simplest 
possible  descent  algoridun  to  find  a  solution,  and  yet 
[irovided  exanples  of  reconstructed  objects  that  were, 
ntuitively,  extremely  good.  (Figure  1,  examples  A 


through  I,  shows  die  line  drawings  used  in  Marill’s  ex¬ 
periments.)  However,  his  pqier  provkied  no  justifica- 
tirai  for  why  the  algorithm  should  wrak,  and  thus  no 
basis  for  judging  its  generality  or  insight  into  how  it 
could  be  improved  (should  this  be  desirable). 

The  first  reference  we  have  found  that  presoits  the 
case  for  choosiiig  between  various  interpretations  of  a 
line  drawing  based  on  an  objective  function  is  Hochbog 
and  McAlister  (19S3).  In  thdr  pqier,  th^  “showed  that: 
(1)  some  variants  of  the  Necker  cube  are  more  likely 
to  be  described  as  2D  figures,  and  some  are  mme  likely 
to  be  described  as  3D;  and  (2)  these  differences  could 
be  i»edicted  by  an  (riijective  and  plausible  coding 
scheme.  Within  this  scheme,  die  economy  of  descrip¬ 
tion  was  assessed  by  (among  other  measures)  tte 
rumfoer  of  lines  and  angles  contained  within  the  coding. 
Thus,  die  costs  and  benefits  of  2-  versus  3-D  intnpreta- 
tions  could  be  assessed.  Figures  that  could  be  coded 
more  simply  under  a  depth  interpretation  were,  in  fret, 
seen  in  dqith;  those  that  could  not  be  simplified  in  this 
way  were  seen  to  lie  in  the  picture  plane”  (Pomerantz 
A  Kubovy  1981,  pp.  439-440). 

Barrow  and  Ibnenbaum  (1981)  suggested  ideas 
similar  to  Marill’s  for  intopreting  line  drawings  (bodi 
for  sinqile  closed  curves  and  polyhedra),  but  did  not 
pursue  the  ideas  in  greater  dqidi.  More  recendy,  Bar- 
ruud  and  Pentland  (1983)  and  Pentland  and  Kuo  (1990) 
have  pursued  Barrow  and  Ibnmibaum’s  qproach  for 
simple  curves  and  line  drawings  of  surfrees  by  finding 
the  smootiiest  curve  (or  surfree)  corresponding  to  the 
line  drawing. 

In  tills  article  we  introduce  our  own  worit  by  first 
critically  CTamining  Marill’s  algoritiim.  Ws  provide  an 
explanatkm  for  why  Marill’s  algoritiun  was  ^le  to  per¬ 
form  as  well  as  it  did  on  the  exanqiles  he  presented, 
discuss  its  weaknesses,  and  show  very  sinqile  exanqiles 
where  it  frils  (figure  1,  exanqiles  J  tiirou^  N).  Wi  tiien 
provide  an  algorithm  that  improves  on  Marill’s  results 
for  all  nine  of  his  exanqiles,  and  also  successfully  deals 
witii  the  simple  cases  where  Marill  frils.  Finally,  we 
examine  some  questions  relevant  to  tiie  implications  of 
this  work  for  uriderstandiiig  tiie  human  ability  to  inter¬ 
pret  line  drawings. 

We  see  the  work  described  here  as  being  of  botii 
theoretical  and  practical  interest.  The  practical  utility 
of  this  work  is  its  relevance  to  man-machine  commun¬ 
ication  about  3D  structures  via  line  drawings— in  par¬ 
ticular,  providing  the  human  with  the  capability  to 
create  a  line-drawing  sketch  of  a  polyhedral  object  that 
the  machine  can  automatically  convert  into  die  intended 


Ah  Optimiz/ttkm-Based  Approach  to  the  huerpretation  cf  Single  Line  Drawings  as  3D  Wire  Frames  1  IS 


Fig.  1.  The  line  dnmingseaniiiHd  in  thii  article.  Ennqrtei  A  through  I  are  taken  from  Marill’i  piper.  Examples  J  through  N  are  line  drawiqgs 
introduced  here  for  udiich  Marilb’  algorithm  foiled  to  recover  a  pfaydxriogically  plausible  3D  model.  Example  O  is  a  line  drawing  for  which 
a  pathologically  idausible  3D  model  is  not  finsiUe. 


3D  model.  Deficioicies  in  pravidii^  a  conqtleie  dieoiy 
are  not  final,  smce  auxiliary  infiirmation  can  always  be 
siqiplied  intnactively  to  resolve  ambiguities,  but  the 
underlying  dieoty  should  reduce  this  “side  communica¬ 
tion”  to  a  minimum  - 

2  MarOl’s  MSD4  AlgorUlun 

Marill’s  algoridim  consists  of  two  components,  an  ob¬ 
jective  function  and  a  simple  descent  optimization  pro¬ 
cedure  fi>r  finding  a  locai  minimum  cS  this  objective 
function.  The  objective  function  is  simply  the  standard 
deviation  of  aU  of  die  aisles  (SDA)  in  tte  recovered  3D 


ofaje^  with  reflect  to  their  common  mean.  Marill  calls 
the  minimization  of  the  SDA  the  MSDA  principle. 

The  input  line  drawing  is  specified  as  a  set  points 
(vertexes)  and  lines;  each  point  is  represented  by  an 
(x,  y)  coordinate  pair,  and  e^  line  is  represented  by  an 
integer  pair  corresponding  to  tiie  sequence  numbers  of 
the  two  points  it  joins.  The  representation  of  the  recov¬ 
ered  3D  object  involves  supplying  a  tiiird  (z)  coordinate 
for  each  of  the  originaUy  qiecified  points.  This  is  what 
we  call  the  ortfaognqihic  extension  the  line  drawing.^ 
It  is  actually  a  wire  frame  rather  than  a  solid  object. 

lb  evaluate  the  objective  function  for  a  given  pro¬ 
posed  solution,  every  pair  of  lines  terminatii^  on  a 


1 16  Leclerv  and  FlscMer 


point  (as  defined  in  die  iiqiut  qiecificadon)  is  con- 
sideied  to  fonn  a  sqiarate  ai^e.  Ilius,  if  five  lines  ter¬ 
minate  on  the  same  point,  every  potential  3D  solution 
ccmtains  ten  angles  at  diis  point  that  contribute  to  die 
objective  fimction.  Note  diat  the  intersection,  between 
two  lines  diat  luqppen  to  cross  at  inurmediate  points 
of  dieir  extent  in  die  line  drawing,  is  not  treated  as  a 
vertex,  and  does  not  contribute  to  the  objective  func¬ 
tion  (evoi  if  die  lines  were  to  lie  in  the  same  plane  ir 
the  3D  reconsiiuctum).  Similarly,  two  distinct  vertexes 
can  have  die  same  (x,  y)  coordinates  in  the  line  draw¬ 
ing,  but  dien  the  line  s^ments  terminating  on  the 
distinct  vertexes  do  not  interact  to  form  angles  (even 
if  the  vertexes  coincide  in  the  3D  reconstruction). 

Thus,  given  a  line  drawing  widi  n  vertexes,  each 
possible  orthogr^ihic  extension  is  represented  as  a  z 
vector  having  n  con^xments;  die  corre^ionding  angles 
and  SDA  are  conqxited  to  evaluate  the  proposed  solu¬ 
tion.  Marill  uses  a  descent  technique  to  search  for  a 
best  answer,  recognizing  that  this  is  sinqily  a  heuristic 
and  diat  this  qiproach  will  find  only  a  single  local 
minimum  of  his  objective  function.  The  iiqiut  object 
has  aU  of  its  z  values  initially  set  to  zero;  ttot  is,  it  is 
a  flat  object  lying  in  the  (x,  y)  plane.  At  each  stage  in 
the  search,  the  SDA  of  die  current  z  vector  is  conqHited 
and  the  program  then  looks  at  the  children  of  the  cur¬ 
rent  vector.  These  2n  children  are  all  the  vectors  one 
stqi  size  away  fiom  the  current  vector,  and  are  formed 
by  both  adding  and  subtracting  a  specified  value  (Az) 
to  each  of  the  n  ctnnponents  in  the  current  z  vector. 
The  value  of  the  SDA  is  conqrated  for  each  of  diese 
2n  childroi,  and  the  child  with  the  minimum  SDA  is 
selected  as  the  new  current  vector.  This  process  is 
rqieated  until  no  inqirovement  in  the  SDA  is  obtained, 
and  the  resulting  z  vectoi  is  returned  as  the  solution 
for  the  first  of  diree  rounds  of  descent.  Each  additional 
round  uses  a  smaller  Az  and  begins  with  the  result  of 
the  preceding  round.  Marill  experimentally  found  ef¬ 
fective  values  of  Az  for  his  thiM  rounds  to  be  1,  0.5, 
and  0.1. 

Figure  2  shows  a  line  drawing,  its  internal  r^esen- 
tation  as  described  above,  and  tiie  reconstructions  us¬ 
ing  Marill’s  algorithm  and  die  algorithm  we  describe 
in  secticm  3. 

In  tile  top  left  window  of  the  figure  is  the  input  line 
drawing  (witii  the  vertexes  numbered  for  referrace  by 
the  writtoi  rqiresentation  below).  The  four  windows 
on  die  tc^  right  show  two  views  of  Marill’s  reconstruc¬ 
tion  and  two  views  of  our  reconstruction.  In  the  mid¬ 
dle  of  the  figure  is  a  table  showing  die  internal  iqiresen- 


tation  of  the  iiqiut  line  drawing.  In  die  first  row  are  die 
(x,  y)  coordinates  of  the  vertexes,  in  die  order  shown 
on  the  drawing.^  In  the  second  row  are  the  integer 
pairs  rqiresenting  die  lines  in  the  drawing.  In  the  diird 
row  are  the  sequences  of  vertexes  corresponding  to  the 
planar  feces  derived  according  to  die  rules  of  iqipen- 
dix  A  (see  section  3).  The  reamstructions  are  dismissed 
in  section  3.3. 


2J  MarilVs  Examples 

Marill  described  die  qqilication  of  his  algorithm  to  ex- 
anqiles  A  through  I  of  figure  1.  We  categorize  these 
examples  along  die  following  dimensions  (based  on  die 
^^learance  of  the  input  drawing  and  on  the  character¬ 
istics  of  die  recovei^  3D  object): 

a.  —Three-dimensional  [ABDEFGHI] 
-Flat  [C] 

b.  —Blocks  world  (planar-feced  solids  with  occluded 

edges  not  rendered)  [B  H  I] 

—Origami  (planar-feced,  possibly  hollow)  [C  F] 
— fiame  of  blocks  world  object  (aD  edges  of 
a  blocks  world  object  are  given,  and  additional 
lines  between  vertexes  of  a  planar  fece  may  be 
added)  [A  D  G] 

—Restricted  wire  frame  (every  closed  circuit  of 
lines,  without  interior  lines  in  the  given  input 
rqiresentation,  corresponds  to  a  planar  fece)  [E] 
— Nonplanar  wire  frame  (none  of  the  above) 

c.  —Symmetric  [A  B  C  E  G  H] 

—Asymmetric  [D  F  I] 

d.  —All  angles  (approximately)  equal  [A  B  E  F  H] 
—A  few  distinct  but  mostly  rqieated  aiigles  (C  G  I] 
— Mosdy  unequal  angles  [D] 

For  the  purposes  of  our  discussion,  we  use  Marill’s 
categorization  and  augment  it  with  our  own  subjective 
evaluation  where  we  disagree  or  need  to  add  additional 
attributes  to  those  Marill  provides.  It  is  important  to 
remnnber  that  Marill  always  returns  a  wire  frame  as 
his  solution,  r^ardless  of  the  cat^orization  of  the  ob¬ 
ject.  Thus,  we  vx)uld  call  the  wire  frame  of  a  blocks 
world  object  a  correct  solution  if  it  was  a  geometric¬ 
ally  correct  rqrresentation  of  the  3D  geometry  of  the 
edges  of  the  psychologically  plausible  blocks  world  ob¬ 
ject  whose  ordiogr^diic  projection  corresponded  to  the 
input  line  drawing,  even  diough  the  wire  frame  does 
not  provide  an  explicit  represwniation  of  the  grouping 
of  lines  into  feces,  and  so  forth. 


M  <^timit/akm-Based  A/^roach  to  the  Interpretation  cf  Single  Line  Drawings  as  3D  Wire  Fhunes  1 17 


_ _ 

1  Mfefiira  Reconstruction 

PoiiiU 

UnM 

(0  1)  (1  3)  (a  3)  (3  4)  (4  5)  (5  0)  (6  7)  (7  8)  (8  9)  <9  10)  (10  11)  (11 «)  (0  6)  (1^ 

(a  8)  (8  9)  (4  10)  (5  11) _ _ _ 

lll-tV-'M 

II 

- a - 

Lengths 

Angles  (Mean  /  Range) 

SDA* 

DP 

1  Origiaal 

1  Object 

0.00  0.10  0.87  1.S3  1.43  0.88 
•3.23  -2.13  -1.38  -0.80  -0.80  -1.57 

1.0  to  4.0 

100.0 

90.0  to  120.0 

0.060923 

0.000000 

1  MariU’i 

1  RecoMtraetiMi 

0.00  0.48-2.15-1.48-2.19  0.72 
-0.37  0.33  -3.81  -1.93  -2.38  041 

1.0  to  3.4 

84.0 

47.5  to  111.2 

0.110660 

0.044710 

1  Our 

1  Reoomtraction 

0.00  0.12  0.98  1.88  1.55  0.71 
•1.99  -1.87  -1.04  -0.»  -0.47  -141 

1.0  to  3.9 

100.0 

88.6  to  122.6 

0.061289 

0.000000  1 

ng.  2.  Example  J.  This  line  dnwing  was  created  by  ofthogtaphicany  pngectiag  a  ipecific  3D  wire  fiame  object.  In  this  cate,  the  object  was 
a  regular  heagymal  priam.  Atthoogh  atWtiaiy  line  diawinp  can  be  need  at  input  to  Oe  reoonitniction  algoridnna  detcribed  in  thii  aitkde 
(with  greater  or  letter  tnooeaa  in  reoontroctkm).  all  of  the  exangilet  introduced  here  were  created  by  etaiting  with  specific  3D  objects.  The 
panels  in  the  upper  right  ihow  two  views  of  the  object  reoonstnxXed  by  Marillt’  algorithm.  The  first  view  is  of  the  object  rotated  d»ut  the 
vertical  axis  by  30  degrees,  and  the  teoaad  it  of  the  object  rolatod  about  the  horizontal  axis  by  90  degrees.  The  two  pa^  in  dm  lower  right 
show  two  views  of  the  object  reconstructed  by  our  algoridim.  The  table  below  this  it  dre  internal  representation  of  the  line  drawing  used  by 
the  reconetniction  algoridunt.  Note  that  inienectiont  such  as  d»te  between  lines  (1  7)  and  (2  3)  are  not  represented.  Marill’s  algorithm  uses 
only  the  first  two  components  of  this  representation.  The  third  conyonent  (fimes)  is  derived  fiom  the  line  drawing  using  the  algorithm  de- 
tcrttied  in  aecdon  3.1.  The  table  at  die  bottom  shows  the  reanlts  of  the  reconstructions  in  written  form. 


Exaniples  A,  B,  E,  F,  andHcanallbe  visualized  as 
qipraxiniately  equiangular  tfaree-dimensioiial  objects. 
Tbat  is  each  of  the  objects  has  an  equiangular  3D  wire 
firame  as  a  psydiologically  plausible  solution.  Since 
these  equiangular  solutions  exactly  satisfy  Marill’s 
mimmum  standard  deviatioo  of  angles  (MSDA)  criter- 
km,  it  is  obvious  ti^  Marill’s  objective  function  should 
prefer  what  we  accqx  as  the  correct  solutitnis  in  these 
cases.  In  the  other  fciir  cases,  stqiposedly  iquesentative 
esaunfdes  of  die  ability  of  Marill’s  al^rithm  to  deal  with 
comidicated  structures  having  unequal  angles,  reason- 
aUy  correct  solutions  are  also  recovered,  and  it  is  diis 
perfbnnance  we  wish  to  understand. 


2.2  The  Performance  cf  the  MSDA  Principle 

Given  its  overall  sinqilicity,  it  would  be  quite  rraoark- 
able  if  the  MSDA  principle  generally  conveiged  to  a 
psychologically  {dausiUe  reconstruction.  UnfbituiuUdy, 
it  is  ladier  ea^  to  find  examples  where  this  is  not  die 
case,  contrary  to  Marill’s  inqilied  competence  for  the 
principle. 

Exiunples  J  through  N  of  figure  1  are  line  drmvings 
for  vidiich  Marill’s  algoridim  ccmvoged  to  solutions  that 
are  clearly  psychologically  inqilausible,  even  diou^ 
diese  drawi^  are  not  significantly  more  complicated 
or  more  asymmetric  dian  the  exanqiles  duU  Mai^  used 


118  Leclerc  ant  FtscUer 


II  Pointa 

(0.M  -0.37)  (0.34  -0.80)  (-0.73  -O.Sl)  (-0.96  0.37)  (-0.34  0  J9)  (0.73  0.61) 

II  Line* 

(0  1)(13)(33)<3  4)(4  5W5  0^ 

N  tacM 

(0  12  346j 

Zs 

Lengths 

Angles  (Menu  /  Rsnge) 

~§wP~ 

DP 

Originml 

Object 

0.00  0.32  0.24 -O.IS -0.48 -0.40 

1.0  to  1.0 

130.0 

130.0  to  120.0 

0.000000 

0.000000 

Marill’e 

neeaiHttuetion 

0.00  0.22-0.13  0.00  0.22-0.12 

0.9  to  1.1 

116.3 

116.3  to  116.2 

0.000000 

0.030363 

Our 

Reconstruction 

0.00  0.34  0.28-0.11-0.43-0.38 

1.0  to  1.0 

130.0  ’ 

119.6  to  120.4 

0.000039 

0.000000 

ng.  1  Emnple  K. 


(figures  2,  3,  4,  5,  and  6  illustrate  both  Marill’s 
rectnstructions  and  our  reconstructions,  as  described 
in  sectkn  3).  hi  Exanqples  J  and  K  it  would  q)pear  that 
the  fiuilt  could  lie  with  Marill’s  use  of  a  descent 
algorttbm  because  die  SEA  of  the  psydiologically 
plausible  answer  is  less  dum  or  equal  to  the  SDA  for 
the  solutkm  Marill  actuaUy  obtaios.  Thus,  one  can 
argue  diat  a  more  cmnpetent  global  search  strata 
could  have  found  die  p^diolqgically  plausible  answer 
using  die  same  object  fimcdoo.  Howevn;  Examples 
L,  M,  and  N  are  line  drawings  for  which  die  SDA  of 
Marill’s  sohitkin  is  significandy  lower  than  that  of  the 
psydiologicaUy  |riausible  solution.  Thus,  die  MSDA 
pfhid|de  is  clmufy  not  adequate  to  m/ioMy  handle  even 
nm|de  line  diawfogs. 

Before  discussing  ways  (rf  augmenting  the  MSDA 
princ^  to  obtain  a  more  ocnopetent  fuiiiciple  and 
algorithm,  we  atteoqit  to  exidain  die  perfomumce  of 


MSDA  for  line  drawings  dqncting  objects  that  are  not 
equiangular. 

2.3  Evalmaing^  Performance  of  Ae  MSDA  Principle 

It  is  not  immediatdy  obvious  vriiy  die  MSDA  principle 
should  prefer  a  psy^logically  plausible  answer  if  the 
object  depicted  in  the  line  drawing  conains  two  or  more 
significa^y  different  angles  (e.g.,  C,  D,  G,  I,  and  J). 
Marill  offins  no  ejqdanadon  finr  diis  phenomoion,  and 
thus  no  way  to  ju^  the  conditions  under  vriiidi  his 
algorithm  should  be  eqiected  to  succeed  or  fell.  In  diis 
section  we  provide  a  partial  egqdanatkm  for  cases  (sudi 
as  C,  G,  J,  K,  and  L)  that  have  critically  important 
attributes— die  psycholqgicaUy  jdausitde  reoonstructkxQ 
is  a  3D  plananfoced  object  whose  feces  are  either 
equiangular  or  form  “complete-star”  configurations 
(see  iqipmidix  B). 


Ah  Optimization-Based  Apinoadi  to  the  baerpretasion  of  Single  Line  Drawings  as  3D  Wire  Dames  119 


O  0 

1  liftrill**  RecoMfcniciioD 

PoiW. 

~(8jr-d.H)  ~(-0.M  .0.41)  T&.4gT)-71)  {O  M  0.85)  (Site -0.57)  I 

iSM  -OM)  {2.28  -0.«1)  (2.04  0.271  {2.76  0.801  {3.72  0.611  1 

Um 

1  (2  8)  (3  4)  (4  0)  {5  6)  (6  7)  (7  8)  {8  9)  (9  10)  (10  5)  I 

(01234* 

(5678910)  1 

5 - 

lieagthi 

Angles  (Mean  /  Range) 

SDA* 

DP 

ObumI 

Object 

OjOO  0.20  0.94  1.06  0.47  0.47 
0.15  0.23  0.63  0.95  0.87 

1.0  to  1.2 

114.5 

108.0  to  120.0 

0.010876 

0.000000 

OjOO  0.30  0.93  1.04  0.46  0.00 
041-0.24  0.00-0.31  0.24 

0.9  to  1.2 

108.0 

107.7  to  108.2 

0.000005 

0.165552 

d«r 

OjOO  0.00  0.00  0.00  0.00  0.08 
0.41  045-0.04-0.36-0.30 

1.0  to  1.2 

114.5 

97.9  to  120.4 

0.018157 

0.000000 

Fig.  4.  Ejaapie  L.  Note  thtt  Mvfil'k  mancfinrtite  leoonitnictioD  has  an  SDA  that  is  significanily  lower  than  that  of  the  pt^cfacriogically 
piansOtle  ori|^  obiect  Thus,  die  MSDA  princ^  iladf  has  Med  in  this  instance. 


lb  esiatdish  die  role  pli^ed  by  the  above  gecmietric 
ssa&naes,  we  dt&aeBm  planar  orduign^ihicejaension 
of  a  simple  closed  2D  circuit  in  a  line  drawing  to  be 
any  oidM^inqdiic  extension  for  v^ch  die  cone8p<»d- 
iqg  3D  contour  is  planar.  If  a  line  drawing  contains 
moie  than  one  shnide  closed  2D  circuit,  then  a  planar 
ordiogrc^c  extension  of  die  entire  line  drawing  ex¬ 
ists  if  we  out  oovnr  die  line  drawing  with  a  set  of  sim¬ 
ple  closed  2D  circuits  sudi  that  (a)  every  angle  in  the 
drawing  is  included  in  at  least  one  circuit,  and  (b)  each 
circuit  projects  to  a  3D  {danar  contour.^ 

In  qipc»dixBS  B,  C,  D,  we  provide  a  number 
of  theor^  that  are  pertinoit  to  understanding  the  ef- 
fecdvmess  of  die  MSDA  {vindple  applied  to  planar 
ordmgraphic  extnisions.  llie  main  dieoran,  iqipoidix 
D,  assols  that  solutions  widi  certain  symmetries  cor- 
reqixnid  to  the  global  minimum  cf  die  SDA  over  aU 
fdanar  mthograidiic  extensfons  (die  specific  qimmetiy 
ctmdition  we  enmine  is  that  ail  must  either  be 


equiangular  or  form  complele-star  configurations). 

Consequendy,  if  thoe  were  some  way  to  consider 
as  possible  solutions  only  die  planar  ordiogiiqdiic  ex¬ 
tensions  td  a  line  drawing  (suc^  as  die  psydiologically 
plausible  stdutions  forexanqdes  A,  B,  C,  G,  J,  K,  and 
L),  diese  solutions  would  be  global  minmui  of  the  SDA 
because  of  the  angular  symmeOy  they  exhibit.  ^  show 
in  cxanqile  L  duu  Marill’s  algoridim  is  not  constrained 
to  seardi  only  for  planar  solutkms;  v^e  it  will  also 
find  solutions  with  ntmplanar  foces  that  have  lower 
SDAs  then  die  planar  solutions,  there  is  still  the 
possibility  that  MSDA  ^ows  at  least  a  weak  inherent 
preference  for  planarity.  While  we  cannot  conqiletely 
rule  out  this  possibility,  it  appears  diat  die  geometric 
constraints  inherent  in  the  specific  examples  Marill 
selected,  ladier  than  MSDA  ittelf,  are  lai^^y  respon¬ 
sible  for  finding  pianar-feced  solutions.  Specifit^y, 
triangles  in  die  line  drawing  will  always  produce  planar 
feces  in  die  orthogra{diic  extension,  and  as  we  prove 


LecUrc  and  FIsdUer 


120 


1  Ifwiil’s  RecooBtittcticm  { 

iM— M 

1  Our  Aeco&fitnictiofi 

1  Pmte 

la^rr-m 

lEZZI 

CElZICX7iiIE31 

mr>7M 

II  iff!  f  TI  IftiM 

0.5  to  1.0 

98.0 

90.0  to  135.0 

0.071381 

0.000000  1 

JRSF, 

I  necooatractiaa 

0.00  0.08  0.07 -0.17 -0.37  0.31 
0.97  0.93  0.18  0.01 

mm 

■BnSEiSl 

0.047823 

0.004897  1 

0.4  to  1.0 

fig.  X  EaBBple  M .  Note  thtt  our  leooMtnictioa  hn  a  iligWy  lower  SDA  liiBn  tint  of  Ifae  origiinl  otjea.  iadinatiiig  the  pwfcreiice  of  our 
alyirilluH  fcr  rqulaiular  fcoea. 


in  qipendix  B,  a  cloaed  tnir-sided  pcdygonal  ^wce 
curve  with  90-d^fee  angles  at  eadi  vertex  will  always 
be  a  idanar  configuration.  Since  in  Marill’s  exanqiles 
luMed  above,  all  die  frees  satisfy  diese  two  geometric 
coaditions,  we  see  wfay  bodi  die  desired  fdanarity  and 
ssmunetry  are  picseitt  in  die  conqwted  sedutions.* 

MariU  offers  onfy  two  exanqdes  (D  and  I)  that  are 
not  dear  instances  of  the  abow  analysis  (ail  angles 
equal,  or  rymmetric  planar  frees).  Hissolutkmfiorex- 
ample  1  is  at  least  qnestknufele  since  it  does  not  recQvn 
the  wire  finme  of  a  pofyhedral  sdid  (our  algorithm 
finds  sudi  a  sdutkm;  thoe  is  a  further  discussion  of 
dns  sufeject  in  sections  3.3  and  4).  However,  this  solu¬ 
tion  has  almost  all  of  its  angles  equal  to  90  d^rees, 
and  so  it  needs  no  furtha  eqdanation  if  we  accept  it 
as  correct. 

Mttill’s  sdution  to  die  aqnnmetric  drawing  of  ex- 
ttiqdeDlo(dBivetytea8onaUe;ithasallitsanglesfrir- 
fy  wdl  distribuied  between  40  and  70  donees,  and  we 
have  not  feund  a  mrae  ^mmetric  (equiar^ular)  ordio- 


graphic  extension  for  this  line  drawing.  Howevmr, 
because  die  iiqiut  line  drawii^  is  a  comidelely  cem- 
nected  set  of  triaqgular  frees,  all  sdutioos  are  ctm- 
strained  to  have  idanar  frees.  Thus,  a  large  range  of 
psydxdt^icalfy  {dausible  objects  is  accessible  to  any 
reasonable  algoridim. 

In  summary,  there  is  an  understandaUe  reason  vdiy 
Marill’s  MSOA  princqde  will  somedmes  tend  to  select 
planar  synnnetric  3D  wire  frames  v^en  a  purely  equi¬ 
angular  solution  is  not  possible.  But  we  also  see  diat 
MSDA  will  make  unaccqKable  errors,  even  in  sinqde 
cases,  because  it  is  not  constrained  to  prefer  sdud^ 
with  {danar  frees  unless  the  geometry  d  die  line  draw- 
ii^  itself  forces  planarity. 


3  Oar  Planarity  Enfindrig  M19Ml  A%oritlmi 

What’s  missing  in  die  MSDA  prindple  is  a  means  for 
enfincing  die  planarity  of  sped^  frees.  Thoe  are  two 


An  OpAmknAon-Based  Approach  to  the  buerpretatUm  cf  Single  Une  Drawings  as  3D  Wire  Phones  121 


1 

1 _ _ 1 

PoinU 

(-0.58  0.24)  (0.95  1.36)  (1.50  1.04)  (-0.02  -0.08)  (0.30  2.89)  (OM  2X6) 

Linci 

(0  1)(12)(i3)(3  5)(5  4)(4  0) 

Fteci 

(O  1  2  3K0  4  5  3) 

- S - 

Iiengtlie 

SEfJT 

bP 

OrifiMl 

Object 

0.00  0.64-0.12-0.77-0.47-1.24 

1.0  to  2.8 

75.0 

45.0  to  90.0 

(Lixim 

0X00000 

Maritt’i 

Reconatnictioii 

0.00  1.21-1.17-0.41-1.18  1.26 

2.0  to  3.3 

03X 

62.7  to  65.0 

0.196270 

Our 

Recooitraction 

0.00  1.93  1.69-0.21-2.21-2.40 

0.7  to  3.6 

5K8 

88Xto91J 

ojonoi 

OJOOOOO 

Pig.  6l  Eamiile  N.  The  SDA  of  Muill’i  uneooepttUe  leooiiitniction  it  tgani  ngniScnlly  lower  then  that  of  the  p^cfaologicaUy  {rintibie 
origiael  object. 


{«m  to  this  problem:  (1)  finding  diose  ftoes  in  die  line 
drawing  that  should  be  (danar  in  the  3D  reconstruc¬ 
tion,  CZ)  and  eafincmg  theidanarity  of  diese  bees  dur- 
ii^  or  at  least  by  the  end  of,  the  optimization  process. 


3.1  Finding  Planar  Faces 

The  following  algorithm  fin*  finding  die  planar  fisces 
is  baaed  on  a  set  of  psychological  assunqitions  presented 
in  qipendix  A.  lie  requiremoits  of  itnns  3,  4,  and 
S  fiom  qipendix  A  have  been  conqwsed  imo  die  foUow- 
ipg  algoiidim.  (In  die  foUowing  discussion,  we  define 
a  foce  in  die  line  drawmg  to  be  a  sequoice  of  vertexes.) 

Rnt,  all  shqile  (nonsd^intRsectiitg)  dosed  dicuits 
comaining  more  dian  diree  lines  are  found.  (Triangles 
are  necessarily  {danar,  so  they  need  not  be  considered.) 


Those  circuits  that  are  either:  (1)  ctmqiletely  entity  of 
both  lines  and  vertexes  (such  as  the  foces  cA  exanqile 
B);  or  (2)  bodi  convex  (in  die  line  drawing)  and  fiee 
of  internal  circuits  (such  as  all  die  foces  ttf  exanqile  J) 
are  considered  to  be  planar  foces  of  die  wire  firame; 
call  this  initial  set  (?o-  A  circuit  is  defined  to  be  an  in¬ 
ternal  dicuh  to  a  convex  circuit  if:  (l)aU(rf  its  vertexes 
lie  widiin  the  convex  circuit;  and  (2)  it  tmninates  in 
two  nonadjacent  vertexes  of  the  convex  circuit. 

Added  to  (Pq  are  those  circuits,  defined  by  die 
following  algoridim,  that  are  nra  mbsets  of  any  circuit 
in  (?o-  Pint,  all  triples  of  ctmsecutive  lines  such  diat 
the  first  and  third  lines  are  paraUel  are  found  (die  two 
planar  foces  of  exanqple  N  M  into  this  category,  as  do 
the  “table  legs"  of  exan^le  Then,  if  possible,  each 
triple  of  lines  is  extraded  widi  additional  consecutive 
lines  such  that  all  evro-numbered  lines  are  parallel  to 


122  LtcUrc  and  FlscMer 


each  odier  and  all  odd-numbered  linea  are  parallel  to 
each  other.  An  enm|de  of  a  dosed  circuit  found  diis 
wiy  is  die  side  cf  die  staircase  fodng  die  viewer  in  ex- 
an^  H;  die  side  of  die  staircase  of^msite  die  viewer 
is  an  examine  of  an  open  dicuit  fmuid  using  diis  same 
ruk. 

Finally,  pairs  of  parallel  lines  lie  on  planar  foces  in 
genenl  position,  so  the  four  vertexes  of  ^  pair  of  lines 
are  defi^  to  form  a  fdanar  foce  (whedier  or  not  die 
vertexes  are  connected  by  lines  in  the  line  drawing). 
If  the  pair  of  lines  ate  not  already  a  subset  d  a  pre¬ 
viously  found  idanar  foce,  these  are  added  to  (?o-^ 

The  above  procedure  is  lemarlatdy  robust  in  dealing 
with  unconstrained  line  drawings.  For  exanqde,  we  have 
yet  to  find  a  case  where  this  procedure  proposes  a 
psydiologically  inqdausilde  planar  foce  (it  even  found 
fo^inourtestcasesthatwehadnottHigiiudiyreoog- 
niaed  as  being  fdanar— such  as  die  back  side  of  the  stair¬ 
case  in  exanqde  H).  However,  it  will  sometimes  miss 
finding  a  concave  {danar  foce  leaving  die  3D  model 
undercoostrained,  toad  this  can  result  in  die  reoonstnic- 
tkm  of  a  psychologically  im{dausible  3D  wire  frame. 
If  we  know  diat  die  line-drawings  to  be  {nocessed  are 
restricted  to  die  projections  of  Mocks  worid  objects  with 
all  {danar  intersections  included  in  the  drawing  (i.e., 
no  hidden  lines  removed),  dien  we  can  be  assured  that 
no  foces  are  missing  fay  (omitting  aome  details  faere) 
first  ena|doyii|g  die  aboro  procedure,  next  removing  aU 
lines’  edges  from  die  drawing  diat  are  assigned  to  two 
foces,  and  then  rqieating  this  whole  process  on  die 
reduced  line  drawing  until  all  the  e^ges  have  beoi 
— to  mocdy  two  fiices  (dme  are  some  qiedal- 
positkm  configuratkMis  in  which  three  or  more  foces 
have  a  single  edge  in  common  that  we  presently  do  not 
deal  with).  Fm  diis  more  constrained  universe  M  line- 
drawings  where  we  correcdy  and  completely  identify 
all  die  planar  foces,  we  have  yet  to  encounter  a  case 
where  our  a^orithm  produces  a  psychologically  im¬ 
plausible  3D  model. 

3.2  Enforcing  Planarity 

The  second  requirement,  enfordqg  planarity,  is  accom- 
friished  by  adding  a  term  to  the  objective  function  that 
is  zoo  when  aU  die  designated  planar  foces  are  actually 
{danar,  and  increases  in  value  as  the  fisces  deviate  from 
planarity  (call  this  term  DP).  The  new  objective  func¬ 
tion,  E^),  is  a  linear  combinatkm  of  die  previously 
defi^  SDA  term  and  the  new  DP  tnm;' 

£(X)  »  XSDA^  +  (1  -  X)DP 


Note  that  minimiTing  £(X)  fovors  planar  foces,  but 
strict  planarity  is  not  necessarily  assured.  This  is  not 
quite  what  we  would  like  in  the  ideal  case.  Ideally,  we 
would  like  to  find  die  ordrographic  extension  of  the  line 
drawing  with  the  lowest  SDA  duu  has  exaedy  planar 
foces  (i.e.,  for  which  DP  =  0).’  Tb  achieve  this,  we 
use  a  continuation  mediod  (Leclerc  1989;  Widdn  et  al 
1987),  whidi  is  a  sequence  of  descent  st^  qiplied  to 
£(X),  for  decreasing  values  of  X.  The  sequtmee  b^ins 
with  the  initial  condition  that  Marill  suggests  (z  =  0 
for  all  points)  and  with  some  initial  Xq  s  1.  Then,  X 
is  decreased  by  a  given  amount  and  the  descent  algo- 
ridun  is  ^iidied  anew,  starting  at  the  solution  found  for 
die  previous  value  MX.  This  is  repeated  until  X  is  suf- 
fici^y  close  to  zero  so  that  no  additional  dianges  oc¬ 
cur  wifo  further  reducti<ms  in  X. 

Wby  not  sim|dy  start  widi  X  close  to  zero  in  die  first 
place?  The  reason  is  that  vdien  X  is  sufficiendy  close 
to  zero,  die  local  minima  of  £(X)  are  determined  only 
by  die|danarity  cornponem.  Thus,  sinqdy  startmg  widi 
X  close  to  zero  would  not  allow  us  to  find  sMutions  widi 
low  SDXs  (in  foct,  whm  X  =  (X  the  original  line  draw^ 
ing,  whidi  is  planar,  is  a  local  minimum  M  £(X)). 
Although  we  cannot  affect  die  shape  of  £(X)  whmi  X 
is  small,  we  can  choose  the  starting  point  for  the  des¬ 
cent  algorithm.  Thus,  the  purpose  of  the  continuation 
mediod  is  to  dioose  a  sequence  of  starting  points  that 
ate  first  strongty  influenced  by  the  SPA  tenn,  but  which 
eventually  beexime  dtnninated  by  die  DP  term.  The 
mediod  is  not  guaranteed  to  find  a  global  minimum  of 
the  objective  function,  but  has  yielded  exedfent  answers 
for  all  die  examples  discussed  in  diis  pqier. 

^  define  the  deviatkm  frmn  planarity  term,  DP, 
as  die  sum  of  terms  DP/,  where  DP/  is  zero  whoi  foce 
fi  is  planar,  and  increase  as  the  foce  deviates  from 
planarity.  We  have  found  two  useful  definitions  of  die 
DP/.  The  first  is  a  strong  planarity  term  that  will  not 
allow  a  fine  to  fold  from  one  planar  configuration  to 
anodier  planar  configuration,  tet  applies  only  to  con¬ 
vex  fiu«s.  Tb  see  how  a  foce  can  fold  from  <me  planar 
configuration  to  anodier  one  widiin  the  context  of  the 
optimization  we  are  performing,  consider  a  line  draw¬ 
ing  Ma  square.  When  all  Mdiez  values  of  the  vertexes 
are  zero,  the  fisce  is  planar.  By  letting  die  z  values  of 
die  first  and  third  vertexes  become  arbitrarily  large,  the 
foce  “folds"  into  a  configuration  that,  in  the  limit,  is 
also  planar.  In  order  to  detect  and  avoid  this  folding 
wheiiever  possible,  we  define  DP,  to  be  the  following 
function  (DPI)  whenever  fine  /  is  convex  in  the  line 
drawing  (DPI  is  based  on  itmn  6  in  qipendix  B): 


M  Opiimimtiim-Bastd  Appmach  to  the  bUerpretatUm  cf  Single  Line  Dnmngs  as  3D  Wire  Dames  123 


Let  N  be  tbe  aualwr  of  lidet  in  the  ftoe,  end  aj  be 
the  anfle  at  the/*  vertex.  Then, 

DPI  -  [^(n  -2)»  - 

A  weaker  tneeaue  of  planarity,  DP2,  qitdicabk  to 
aD  koea,  ia  baaed  on  the  obaervilion  diat  the  normals 
deftrtd  by  patra  of  oonaeculive  pairs  of  linea  ahoold 
lie  in  the  same  dhection  (fliis  is  analogoos  to  die  no¬ 
tion  of  tornon  tw  a  cnrve): 


DP2 


y  r  j  _  r  (h-i  X  ^)  •  ((y  X  if+i)  ^  1  * 

fL'  L|4.,x4||4x4..|  JJ 

adien^iathe/*lineof  planar  fine/ and 7  -  1  and 
7  -f  1 1^  to  the  previous  and  next  Ikaes  in  the  &ce, 
reapecdvely  0>e.,  die  subacri^  are  taken  modulo  die 
number  of  lines  in  the  foot). 

The  combined  OP  term  is  die  sum  of:  (1)  the  sum 
of  DPI  over  all  convex  fines,  and  (2)  the  sum  of  OP2 
over  all  nonoonvex  fines  divided  by  the  number  of 
angles  in  all  of  die  nonconvex  fines. 


3.3  Resubs 

Rgures  2  dnoogh  6  iDustnee  the  results  of  our  pjemority 
eridininiy  MS2M  algorithm,  and  allows  one  to  cooqiare 
diem  with  bodi  Marill’s  reoonatruotioos  and  die  original 
3D  objects  that  wne  used  to  generate  die  line  draw¬ 
ings.  The  *Viriginal  3D  objects"  presented  in  our  figures 
are  the  ptydiologically  plmsibie  solutions  dnt  we  ex¬ 
pect  die  program  to  recover.  started  widi  actual  3D 
wire  fiaoMS,  radier  dian  aibitraiy  line  drawings  as  an 
experimental  cxpediait,  since  most  random  line  draw¬ 
ings  will  not  iiMhice  the  petcqNkm  of  a  3D  configura- 
tioo  in  Imman  subjects. 

The  reconstructions  are  illustrated  bodi  gr^ihically 
(as  two  views  in  die  appet  diird  of  eadi  figure)  and 
in  tdwlar  form  in  the  lower  third.  The  first  column  of 
the  taUe  lists  the  z  cotHdinales  of  each  otgect,  the  sec¬ 
ond  column  is  the  range  of  Ingths  of  the  lines  of  eadi 
object,  the  third  adnmn  is  die  mean  and  range  of  die 
angles  tinned  by  ail  line  pairs  meetiitg  at  a  comnxHi 
vertex,  die  fiwrdi  odumn  is  die  standard  deviation  of 
aqgles  (SDA)  of  each  object,  and  the  fiidi  cdumn  is 
the  deviation  fiom  jdanarity  (DP)  of  each  object.  Do 
uvofiRfy  the  convarison  of  the  results,  die  recovered 
z  cooidinatBS  hare  been  normalized  so  diat  die  first 


pmnt  alwtys  has  z  *  0^  md  the  second  comdinate  is 
alwtys  positire  (dus  normalization  procedure  has  no 
efiect  on  the  objective  function). 

We  also  ap^ied  our  algoridun  to  exangdes  A 
through  I  fimn  Matin’s  pqwr.  SinoehisalgoridmipfD- 
duced  appnximaiely  planar-fiKed  sohttioiis  by  its^  in 
aU  caaes  but  examide  I,  it  im’t  surprising  that  our 
algorithm  produced  acdutions  alrnoat  identical  to  his. 
The  gremest  deviation  from  his  result  was  for  exaiq[de 
1,  because  Marin’s  algMithm  recovered  a  significantly 
noigdanar  free  for  die  leAmost  fine  of  the  line  drawing. 

In  an  of  die  examides,  the  Azs  vre  used  finr  Marfil’s 
algoridun  (both  as  a  stand-alone  algoridun  and  within 
die  continuation  method)  were  (X12S,  0lO62S,  0ii0312S, 
0013^  and  0b007.  Tib  used  a  smaner  imtial  Az  than  Marin 
suggests  because  the  larger  one  often  fiuced  the  algo¬ 
rithm  out  of  the  vall^  of  attraction  of  the  current  local 
minimum.  Decreasing  Az  by  a  foctor  of  two  generally 
allowed  the  algoridun  to  run  in  the  fewest  number  of  iier- 
rtkms.  Using  a  Mwaller  fioal  Az  allowed  die  algoridun 
to  produce  significantly  more  accurate  solutions.  Indie 
continuation  method,  X  was  started  at  a2S,  and  was 
decreased  by  a  fnaot  of  tvre  a  total  of  ten  times. 

Example  J  (figure  2)  ilhntraiBS  Marffl’s  reconstruc¬ 
tion  for  a  tine  drawing  of  a  rectangular  hexagonal  prism. 
This  reconstruction  not  only  appears  ptydudogically 
implausiMe  fiom  these  two  views,  but,  as  we  diacuss  in 
the  fidlowing  section,  the  reconstructed  object  does  not 
qipear  rigid  when  totaled  in  real  time.  It  would  appear 
that  at  least  part  of  die  reason  for  dus  result  is  thtt  die 
recovered  ^es  are  dearty  noo|danar,  as  shown  by  the 
value  of  DP  in  the  table.  The  reconstruction  obtained 
by  using  the  planarity  enfincing  MSDA  algorithm  is 
almost  identical  to  tte  original  hexagonal  {uism. 

In  example  K,  we  see  that  the  MSDA  ptincfole  is 
andnguous  for  simide  line  drawings.  Ma^’s  tecon- 
struetkm  takes  the  line  drawiiig  of  a  planar  hexagonal 
pIstaiSDA  « (XO)  and  reconstructs  a  non|danar  object, 
also  with  SDA  « (UX  By  enfiircing  {danarity,  however, 
our  leamstruction  is  quite  close  to  the  original  hex¬ 
agonal  plate. 

In  exanqiles  L  and  N,  vre  see  further  evideiioe  diat 
the  MSDA  princqile  by  itself  is  inadequate  for  even  sim¬ 
ple  line  drawings.  In  botii  exanqdes,  Marill’s  recon¬ 
struction  has  a  significantly  lower  SDA  than  the  original 
object,  and  we  consider  todi  of  these  reconstructions 
to  be  ptychologically  inqplausible.  Our  recmistruction 
cf  exanqile  L  is  quite  close  to  die  original  object, 
modulo  an  additive  constant  and  flqi  of  the  z  cooidinrtes 
of  die  seamd  object  (which  is  invisiUe  to  die  objective 


124  Ltdmt  tmd  FIsdUer 


X4  StabUity  and  Robustness  of  du  Planarity  E/^rc- 
ing  MSDA  Algoridun 


ftBGtkMi).  RiMn|iteNisafiuriyMabiguoug^yre,M>d 
our  reoQiHtnictioa  fiwofed  a  with  all  aisles 

dkite  ID  90  defiees  (ttie  origioal  ol^  had  a  ‘liiQge- 
aagle’*  of  45  decrees).  Because  of  the  amlMfiiity  of  the 
Bfure,  dMie  codsts  a  fiunily  of  lecoostnictioiis  that  we 
consider  psychoiogkally  fdausSrie,  outs. 

Bxunple  M  shows  the  reoonttnictkm  of  a  figure  for 
uduch  some  of  the  fdanar  foces  are  not  equiat^ular. 
Again,  hecanse  some  of  die  fines  had  more  dian  fimr 
sides,  Marill’s  algoritiim  fiuled  to  recover  a  psycho- 
fogically  piansiUe  object  Ourreooostmodanisreaaon- 
dMy  gotxi,  but  it  did  adtjust  the  right  angles  in  the  laige 
fooe  hy  as  mudi  as  13  th^rees  in  order  to  the 
angke  in  that  fine  closer  to  beiog  equal.  Nbnetheless, 
we  consider  die  reconstruction  to  be  psydudogically 


have  examined  the  stability  and  robustness  of  our 
algorithm  in  two  ways.  The  first  was  to  «•"«««»  die 
behavim  of  the  algorithm  ^iplied  to  different  projec¬ 
tions  of  the  sanae  3D  objects,  but  alvnys  h*w^  dn  Mme 
initial  conditions  for  the  optimization,  namely  z  0 
for  an  vertexes.  Ihe  second  vns  to  exunine  the  behavior 
of  the  algmithm  for  different  inirial  omditions. 

^  tan  the  planarity  enforcing  MSDA  algorithm  on 
at  least  32  landcxnlydiosen  projections  of  the  SDobjects 
used  to  create  the  line  dnwh^  of  examines  A  through 
N.*"  For  virtuaUy  every  projection  of  each  of  these  ob¬ 
jects,  the  algorithm  reconstrocted  the  object  as  well  as 
it  did  for  the  original  projection.  Forexanqde,  figure  7 


0 

f 

0 

f 

# 

A 

i 

f 

1 

e 

i 

S222iiL2g 

1 

i 

/fe.  7  Niae  projecliom  of  the  hewnowl  prim,  nd  our  coneiponiliag  recoatnictiow.  The  projectinii  need  as  origiaal  line  drawings  are 
shown  in  die  lower  left-hand  ooner  ot  eadi  groep  of  fair.  The  original  line  drawing  is  annoled  by:  (1)  the  projeetkm  number;  (2)  the  letter 
S  when  the  ptanarhceaiiundtwthst  line  drawing  were  fee  same  as  ftw  the  original  projection,  and  D  otherwise;  (3)tliemimberofiiHatk»is 
reqaiied  tor  ooBweiaesme;  and  (4)  the  largest  difference  between  oonespondhig  angles  in  the  reconstruction  and  the  original  object,  in  degrees. 
The  odwr  three  Use  thsariags  are  three  views  of  the  reconstruction. 


An  Optimizptkm-Bastd  Approach  to  rite  buerpretatum  of  Single  Line  Drawings  as  3D  Wire  Dames  125 


shows  nine  projectioiis  and  the  corresponding 
reconstiuctioas  to  the  heugcmal  prism  (Marill’s 
algoridnn  fiuled  to  all  of  these  projections).  An  ex* 
an9le  of  a  near  fiulure  is  shown  in  fipue  8,  where  the 
ei^lh  [^ejection  of  die  staircase  is  almost  in  special 
position,  producing  die  largest  error,  and  using  the 
gremest  number  of  iteratioos.  In  fiict,  when  die  rule 
adding  all  pairs  of  paralld  lines  as  planar  toes  is 
removed,  die  algorithm  leaves  die  2  values  virtually  un¬ 
changed  from  their  initial  values  (not  illustrated  here). 
In  summary,  in  approximatriy  500  trials,  rither  the 
idanarity  entocing  MSDA  algorithm  coriecdy  recon¬ 
structed  the  original  object,  ot  it  left  the  line  drawing 
as  an  “uninietpreied’*  flat  object. 

By  conqiarisoo,  the  MSDA  algoridim  is  relatively 
unstaUe,  even  to  die  line  drawings  one  mi^  eiqiect  it 
to  get  ri^.  For  exanqde,  figure  9  shows  nine  projec¬ 
tions  and  die  correqKmding  recmistructions  using  the 
MSEA  algoridim,  to  a  cube  in  which  all  of  the  angles 


should  be  exacdy  equal.  Note  that  projections  1  and  9 
produce  psydwkigically  inqilausible  recmistructions. 

In  a  second  set  of  experiments,  we  used  a  randmn- 
nuridier  genaattu'  lo  provide  tweruy  sets  ci  initial  zs 
in  die  range  -1  to  1  to  etounides  A  through  N.**  Mdi 
the  cacqNion  of  eaanqde  D,  which  was  alwi^  correcdy 
recmistnicted,  die  MSDA  algoridun  huled  to  converge 
to  a  psychologically  plausible  solution  in  at  least  toir 
oi  the  twoity  trials  on  each  of  die  othm*  line  drawings, 
and  produced  an  average  of  ten  Mutes  per  line  draw¬ 
ing.  In  other  words,  die  SDA  term  by  itself  has  many 
local  mininoa  that  descent  algorithms  will  M  into. 

On  die  other  hand,  the  planarity  enforcing  MSDA 
algorithm  succeeded  in  convergmg  to  a  paychtdogical- 
ly  {dausible  solution  in  all  trials  but  (me  (it  Med  in 
one  trial  at  exanqile  N,  die  hinge.)*^  This  extrormly 
robust  performance  was  somewhat  unexpected.  ^ 
bdieved  that  the  initial  condition,  z  =  0  to  all  vertexes 
was  an  importaat  component  of  the  continuaticm 


F^.  &  Nine  projectiont  of  the  staircaae.  and  oar  ocwreiponding  recnmnictioni.  Note  diat  the  eighth  projection  is  very  neariy  in  special  posi- 
tioo,  widi  many  veitBies  and  lines  oveiiapping  in  the  line  (hawing.  The  continuation  method  had  the  laigest  enor  and  used  die  greatest  number 
of  ileiations  for  this  case.  When  the  rule  adding  all  pain  of  paiallel  lines  as  planar  foces  is  removed,  the  continuation  method  prefen  the 
original  line  drawing  (all  a  constant)  as  the  intepretation,  which  is  certainly  psychologically  pInisiUe. 


126  LtdtK  and  PbcUer 


0 

o 

6* 

03 

2SUM0J6 

o 

nil 

03 

fiiwm  - 

0^ 

0 

0 

0 

Fig.  A  Nine  projeclione  of  the  cdie,  ml  die  ooneeponding  recomtnictiom  nsiiig  Mvill’i  algoridmi.  Note  dwt  projectiom  1  and  9  produced 
pqcholofiadly  inptaneiUe  reoonatractions. 


method,  ifowever,  it  would  qipear  from  die  results  of 
diese  oqtaiffloits  dut  the  imposition  of  die  planarity 
teim  in  die  contiimtion  metM  severely  curtails,  or 
elimiiMttes,  ptychtdogically  imjdausiUe  minima.  One 
m^  conjecture  that,  fin  most  line  drawings,  diere  is 
ofie°  (ot  periiqis  a  very  fisw)  ptydiologically  plaus- 
iUe  local  minima  in  dm  SDA  when  die  zs  are  con- 
strained  to  a  planar  ordiogra|diic  extrasitm. 


3.5  Reconstruction  Time 

The  specific  descan  algcnidun  deGoed  by  Marill,  and 
described  here,  has  die  nice  property  that  it’s  eaty  to 
describe  and  eaty  to  inqdement,  no  matter  what  die 
objecdve  fimcdott  may  be;  however,  h  is  typkatty  quite 
inefficient.  Qne  of  the  better  descmtalg^duns  is  die 
conjugate  gradient  algorithm,  lb  estimate  achievable 


run  times,  we  inqdemented  the  conjugate  gradmat  algo- 
ridim  described  in  Numerical  Recipes  (Press  et  al. 
1986).  The  algoiidim  requires  an  objective  funcdcm  (in 
this  case,  E(K))  and  die  gradimit  of  the  objective  fimc- 
tkm  (in  diis  case,  a  funcdm  diat  returns  a  vector  whose 
1"^  element  is  the  partial  derivative  of  £(X)  widi  reflect 
to  Zi)’  Analytically  deriving  die  gradient  of  E(X)  is 
rather  painful,  so  instead  we  used  a  sinqile  numerical 
qqiroximation;  this  involves  evaluating  the  objective 
function  for  each  vertex,  which  is  expensive.  A  more 
efficient  inqilmnentation  diat  only  recomputes  diose 
conqxmoitB  of  the  objective  function  that  change  when 
a  givmi  vertex  dianges  could  reduce  die  following  run 
times  by  a  factor  of  four  or  better. 

Thble  1  gives  the  number  of  iterations/run  time  (in 
seconds)  for  three  example  line  drawings.  These  ex- 
pmiments  were  run  on  a  Symbolics  3643,  so  we  would 
expect  about  a  &ctor  of  ten  improvement  if  algorithms 


Ah  Optimivtum-Based  Ap^mack  to  the  Interpretation  of  Su^le  Line  Drawings  as  3D  Wire  Frames  127 


were  impkmcBlBd  in  C  on  a  modem  woriatatkm  such 
as  a  SUN  SFi4RC-2  (according  to  some  sin^de  bench- 
mails  that  we  nn).  The  last  column  gives  the  expected 
run  time  for  an  optimized  ooi^ugale  gradiem  aigmithm 
running  on  a  SIMlC-2. 


Ibbk  1.  Number  of  atntinm/nm  time  tx  three  cmmiilre 


Example 

Origtaal 
Deaomt  on 
SyoMics 

Cmdaeau 
Qradieat  on 
SymboUca 

Coniaeau 
Gradtent  on 
SPARC-2 

Cube 

itnim 

15/15 

13A).375 

— » »  ■ 

icino0Ofoo 

46/9 

14/16 

14/0.4 

Hexagoo  prian 

406/1306 

33/186 

33/4.65 

Note  duit  foe  coiyiigate  gradient  algorifom  inq>roves 
foe  run-time  con^deraUy  for  all  but  foe  simple 
tetrahedron  line  drawing.  On  a  SBMIC-2,  the  run  times 
are  such  that  the  time  le^iired  to  reconstruct  a  line 
drawing  is  emell  relative  to  the  time  it  would  take  to 
manually  oiter  the  drawing.  That  is,  tiie  run  times  are 
well  witiiin  “interactive  time.” 


3.6  A  Seduced  Search  ^tace  "Rdmiquefor  Obtaimng 
Exact  Planar  MSDA  S^onstructions 

In  the  planarity  enforcing  MSDA  algraitiun  described 
in  section  3.2,  (danarity  is  not  strictly  mfoiced,  but 
ratiier,  noiq)lanarity  is  penalized  during  the  optimiza¬ 
tion  process.  This  iq)|noach  almost  always  produces 
times  tiut  are  very  nearly  planar  at  the  mid  Of  foe  (q>- 
timization  process.  There  is  a  very  effidoit  way  to 
strictly  enforce  planarity  during  tiie  MSDA  optimiza- 
tkm  fiv  line  drawings  of  strictly  planar-tiK^  wire 
frames,  described  below.  The  problem  with  this  iq>- 
{Hoadi  is  that  if  die  line  drawing  does  not  actually  cor- 
respcmd  to  a  |danar-&ced  wire  frame,  or  if  the  line 
drawing  is  not  aocurme,  the  resulting  reconstruction  win 
typically  be  ptydiologically  unacoqdable— we  lose  die 
graceful  d^radation  i»ovi^  by  dw  planarity  enforc¬ 
ing  MSDA. 

The  foBowing  rnethod  for  strictly  enforcing  planarity 
is  based  on  dw  observation  foat  there  are  ^  tiewer 
d^rees  ai  fireedmn  in  a  planar-foced  object  than  there 
are  vertexes  (to  reenqdusize,  this  method  is  only  ap- 
plicalde  to  line  drawings  of  strictly  jdanar-foced  wire 
frames).  One  way  of  egqnessing  diis  observation  is  in 
terms  of  a  subset  ci  vertexes,  that  we  caU  the  free 
verterer,  whose  z  values  uiuquely  detennine  die  z  vidues 
of  aU  of  die  other  dependent  vertexes  by  virtue  of  die 


plamuity  of  cotain  times.  Fra  instance,  given  die  planar 
frees  of  the  hexagonal  prism  of  figure  2,  specifying  the 
degda  of  the  four  vertexes  0, 1, 2,  and  6  uitiquely  deter¬ 
mines  die  dqjfo  of  die  ofoer  vertexes:  the  depth  of 
vertexes  3,  4,  and  5  are  determined  by  craistrainii^ 
them  to  lie  on  the  same  planar  free  as  vertexes  0,  1, 
and  2;  similarly,  vertex  U  is  determined  by  vertexes 
Ot  S,  and  6;  vertex  K)  by  vertexes  4,  S,  and  U;  and 
vertexes  7,  8,  and  9  by  vertexes  6,  1(X  and  11.** 

Having  determined  foe  free  vertexes,  one  can  dien 
^iply  the  MSDA  principle  to  the  reduced  search  space. 
Fra  die  case  of  the  sinqile  desemt  algorithm,  the  rally 
change  to  foe  algoridim  is  diat  only  the  free  vertexes 
are  direedy  modified  during  the  optimization,  and  that 
the  dqifo  of  aU  of  the  dependent  vertexes  are  recran- 
pmed  whenever  a  free  vertex  is  modified.  Apfdying  dus 
method  office  vertexes  to  the  hexagonal  prism  reduces 
die  num^  of  iterations  from  4(XS  to  39,  and  die  run 
time  frran  1306  seconds  to  47  (die  run  time  is  reduced 
by  a  greater  proportirai  dum  the  number  (d  iterations 
because  the  DP  term  has  effectively  been  removed  from 
the  objective  function). 

Thm,  die  advantage  of  using  the  method  of  free 
vratexes  is  that  it  reduces  the  search  space  and  run  times 
craisiderably— rrflentimes  an  order  of  magnitude  or 
more.  The  disadvantage  of  using  diis  approach  is  that, 
unlike  the  planarity  enforciiig  MSDA  a^raritfam,  it  re¬ 
quires  a  virtuaUy  perfect  line  drawing  of  a  (danar-freed 
object  to  oisuie  that  the  resulting  reconstruction  is 
planar.  For  exanqile,  adjusting  the  (x,  y)  coordinates 
of  even  rate  vertex  by  a  smaU  amount  in  a  line  drawing 
sudi  as  die  cube  (exanqile  A),  can  cause  die  3D  wire 
frame  to  be  highly  nraqilanar  for  some  choices  of  z 
coordinates  of  the  free  vertexes.  Consequraitly,  die 
mediod  of  free  vertexes  can  produce  lecraistructions 
that  are  not  paycfaologicaUy  fdausible.  Non^beless, 
there  are  certain  situations  in  whidi  fois  ^iproach  can 
be  effective,  both  fira  special  kinds  of  line  drawings, 
and  for  line  drawings  t^  are  first  processed  to  make 
than  faecise  projections  of  the  intended  3D  object. 


4  Imidicatioiis  for  Human  Vision 

Line  drawings  provide  an  dfective  means  of  commun¬ 
ication  about  foe  geometry  of  3D  objects.  It  is  a  matter 
of  some  debate  as  to  whether  the  interpretation  line 
drawii^  is  a  learned  skill,  or  whether  line  drawings 
are  isomoridiic  to  some  intermediate  construction  (tf 
die  human  visual  tystem  (HVS)  in  its  normal  {aocessing 


128  Lederc  and  FIsdUer 


of  imagery,  but  io  ddier  case  an  undentaading  of  how 
Immans  iitt»pfet  line  drawings  is  extremely  in^rtant 
in  enaUiag  man-niadiine  ootmminication  with  respect 
toimages,diagrani8,andq)atialooiistroct8.Intfaissec- 
tkm  we  address  two  related  questions  aiisiog  out  of  the 
investigmion  described  in  eariier  sectknts:  (a)  under 
what  condititHis  is  a  line  drawing  actually  given  some 
intended  3D  intmpretation,  and  (b)  under  what  ctmdi- 
tions  does  a  moving  rigid  (wire  frame)  object  actually 
appear  rigid. 

Some,  but  not  all,  line  drawings  are  perceived  fay 
human  subjects  as  being  tiiree  dimensuHud.  What  at- 
tribides  of  the  drawing  promote  such  an  interpretation, 
and  what  are  the  constraints  on  the  ruture  of  the 
resulting  3D  construction?  Ibrtially  because  human  in- 
troqrection  is  iimrived,  this  is  a  very  difficult  question 
to  answer.  For  exanq^  if  die  drawing  is  recognized 
as  a  known  or  previously  enoountered  3D  object,  it 
mi^  be  visualized  tins  way  even  diough  it  violates  con- 
ditkms  necessary  for  an  uiifcniiliar  object  to  be  perc¬ 
eived  as  being  dm  dnnenskmal.  Gestalt  psychologists 
havesuggestedthatifthedrawingoffersasiiiqrlerani- 
struct  when  seal  as  three  dimensional  dian  when  seen 
as  being  flat,  it  will  be  peicdved  as  bong  diree  dimoi- 
skmal;  however,  an  elective  conqiutatkmal  procedure 
to  evahiate  “siiiqrier”  has  yet  to  be  provided  (and  dioe 
is  also  the  problm  of  producing  die  correqxaiding  3D 
construct).  One  miglttoonsida  drat  minimizing  angular 
variance  is  an  example  of  a  sinqilici^  principle,  but 
we  have  not  yet  been  able  to  defiM  a  formal  complex¬ 
ity  metric,  as  was  done,  for  exanqile,  in  the  worir  cd 
Lecletc  (1989). 

It  qipears  to  be  much  more  productive  to  show  a 
human  subject  a  candidate  3D  reconstruction  and  ask 
if  it  correqxinds  to  some  given  line  drawing  dum  it  is 
to  tabulate  intro^iective  judgments  about  whether  ob¬ 
jects  appear  to  be  2D  or  3D.  The  former  qiproach,  in 
fimt,  is  how  Marill  presents  his  results  to  die  reader. 
Obviously,  he  can  not  show  an  actual  3D  reconstruc¬ 
tion,  but  only  a  projection.  If  he  showed  the  recon¬ 
structed  object  inojected  without  some  spatial  reloca¬ 
tion,  dial  ^  we  have  is  the  original  line  drawing  back 
again— and  no  (fotermiiiation  can  be  made;  Marill 
shows  two  projections  ci  his  reconstructed  objects, 
rotated  by  a  few  d^rees,  fen  evaluation  by  the  nrader. 
Now  we  know  drat  every  orthogrqiliic  extension  is  a 
geomerriooUy  feasible  reamstruction,  so  on  what  basis 
does  die  huim  judge  accqitability  (i.e.,  what  we  have 
called  a  psydudo^cedfy  plausible  reconstruction).  It 
is  ea^  to  hypothesize  a  whole  list  of  amditions  drat 


should  be  ma— mosdy  diffooit  instantiatkms  of  the 
idea  drat  r^ularities  (such  as  parallel  lines  or  equal 
angles  and  lengdis)  observed  in  die  line  drawing  are 
not  accidental,  and  should  be  {neserved  in  the  recon- 
stniciBd  object;  orthogiqihic  prcgective  invariants,  sudi 
as  parallelism,  should  thoi  also  be  preso’ved  in  the 
rqnojectkms  of  the  ^tially  relocated  object.  One 
could  write  conqwnifinnal  procedures  ID  search  for  such 
inwiants,  but  ^  approadi  seems  inconqiatible  widi 
the  universality  of  the  human  evaluation  inocess  (e.g., 
none  of  the  invariants  we  happoied  to  think  d  msy  be 
present  in  the  instances  we  are  considering).  A  more 
powerful  idea  is  to  require  drat  die  computational  pro¬ 
cedure  drat  produced  die  original  lecmistruction  give 
the  same  result  vriioi  qiplied  to  any  of  its  general  posi- 
tkm  reprojectioira— that  is,  a  consistency  criterion.  This 
is  exaoiy  the  condition  that  obtains  when  we  observe 
a  moving  or  rotating  object  to  be  rigid;  when  we  see 
a  (contimioos)  sequence  of  projections  that  we  perceive 
as  being  isomoridiic  to  the  same  geometric  reconstruc¬ 
tion,  we  perceive  the  object  as  being  rigid. 

^rplying  the  above  ideas  to  an  evaluation  of  die 
MSDA  algoridnn,  we  find  two  serious  deficiencies  in 


Fig.  JO.  Dliutntion  of  die  Mure  of  Marill’s  aigoriiiim  to  recover 
geometrically  similar  3D  models  fiom  two  different  pioiectioas  of 
the  same  3D  project.  The  top  row  shows  die  input  line  drawing  of 
the  3D  object  as  seen  from  one  viewpoint  (similar  to  exanyde  O), 
and  two  views  of  Marill’s  reconstniclBd  object.  The  bottom  row  shows 
the  iqait  line  drawing  of  die  same  3D  object  as  seen  from  a  dif¬ 
ferent  viewpoint,  and  two  views  of  Marill’s  reconstructed  object.  The 
two  reonMnicted  objects  not  only  ^ipear  diiinent,  but  are  in  feet 
significandy  diflerent  geometrically,  as  we  verified  by  ctamining  their 
internal  r^aesenlation.  In  contrast,  qiplying  our  algorithm  to  both 
of  these  input  line  drawings,  as  well  as  ten  other  randomly  chosen 
views  prothioed  reconstructions  with  an  angular  error  of  less  than 
thirteen  degrees  from  the  miginal  object. 


Alt  OptmizaAon-Based  Approach  to  the  Interpretation  of  Single  Line  Drawings  as  3D  Wire  Frames  129 


Mg.  II.  Tlie  iHmtaa  of  noarigidity  fat  a  lottiiig  wire  fiaine  with  aoq>laBar  hce».  The  wire  firamc,  MariU's  reconsiniclioo  of  exanqrie  J,  is 
tottlBd  about  a  veitical  axia  in  the  cenler  of  the  obiect.  The  rotation  angle  is  written  in  the  lower  kA-faand  corner  of  each  box. 


the  algoridun.  Rrst,  when  presented  widi  two  dif- 
fsrent  orthogn^^  projections  of  an  object,  the  MSDA 
algorithm  somedmes  finis  to  recover  3D  wire  frames 
that  are  even  ronotdy  similar  to  each  odier  (see  figure 
K)).  Second,  whoi  we  use  the  con^wter  to  create  a 
rotating  disfday  of  some  of  the  reconstiuctioiis  obtained 
with  the  use  of  the  MSDA  algorithm,  we  see  what  ap¬ 
pears  to  be  the  movement  of  a  nonrigid  object  (see 
Figure  11). 

The  latter  observation  led  to  a  number  of  casual  ex¬ 
periments  to  detennine  the  fiKttns  affecting  the  petcq>- 
tion  nmuigidity  in  displays  rotating  3D  wire 
frames.  W;  fiamd  that  wire  frames  with  pronounced 
noqilanar  fiu:es  (where  one  would  have  expected  a 
planar  &ce  fixnn  the  line  drawing)  appear  to  be 
nonrigid.  MariU’s  solution  for  example  I  (asymmetric 
solid)  does  qipear  rigid  under  rotaticm,  even  though 
the  faces  are  sUghtly  warped.  However,  his  solution  is 
very  nearly  planar;  if  we  fince  a  bit  more  distortion 
into  the  solution,  the  object  thoi  appears  to  deform 
undn^  rotation.  Thus,  it  would  aj^iear  that  strict  (or  at 
least  near)  planarity  fior  the  appropriate  fiK:es  is  a 
necessary  condition  for  die  percqition  of  rigidity. 

However,  planarity  by  itself  was  not  sufficient  to 
create  a  perception  (rf  rigidity.  For  euunple,  if  one 
dwoses  random  values  for  the  free  vertexes  of  a  cer¬ 
tain  line  drawing  (see  section  3j6),  one  produces  an  ob¬ 
ject  whose  fiuxs  are  stricdy  planar.  However,  unless 
the  resulting  figure  is  also  a  lo^  minimum  of  die  SPA, 
die  lesuldi^  3D  wire  fiame  does  not  appear  rigid  when 


rotated.  Similarly,  the  wire  frames  of  some  line  draw¬ 
ings  widi  aU  of  die  z  coordinates  set  to  zero  appeared 
mmrii^  when  rotated  (e.g.,  examide  A).  Purthamore, 
aU  of  die  hundreds  of  solutions  priced  by  the  fdanar- 
ity  enfwcing  MSDA  algorithm  that  we  looked  at  ^ 
peared  rigid  under  rotation.  Thus,  we  tentatively  con¬ 
clude  diat  a  wire  frame  must  not  only  be  planar  to  be 
perceived  as  rigid,  but  must  satisfy  additional  con¬ 
straints,  such  as  being  a  local  minimum  the  SDA. 

5  Future  Work 

There  are  a  number  of  directions  that  we  have  b^un 
to  explore  or  that  we  plan  on  exploring  in  the  near 
future. 

The  first  of  these,  for  which  we  have  some  prelim- 
inaiy  results,  is  a  redefinition  the  objective  function 
in  which  the  angles  are  partitioned  into  groups  dial 
should  be  equiangular  in  3D.  This  becomes  necessary 
either  when  there  are  angles  in  the  line  drawing  diat 
are  not  a  part  of  any  planar  fiu:e  or  when  the  angles 
in  a  planar  &ce  are  not  all  equal  in  3D  (in  either  of 
diese  cases,  the  tymmetric  pr^rence  dieorem  ap¬ 
pendix  D  does  not  hold).  An  example  of  the  first  case 
is  the  hinge  (figure  6),  in  which  angles  (10  4)  and 
(2  3  S)  are  not  a  part  of  aity  planar  fiu:e.  An  example 
of  the  second  case  is  the  truncated  box  (figure  5),  in 
which  angles  (1  2  3)  and  (2  3  4)  should  be  equal  to 
each  other  but  not  equal  to  die  other  angles  in  planar 
fece  (0  1  2  3  4),  and  similarly  for  fiu;e  (S  6  7  8  9). 


130  Leclerc  and  Fischler 


I 

u 

I 

^^1 

W  (b) 

Fig.  12.  An  iUmtmkm  at  tbe  need  to  groq)  anglei  together  than  should  be  equiangular,  nther  than  andying  the  jdanarity  enforcing  MSDA 
principle  to  all  angles,  (a)  The  reconstructim  using  >»  qtpropriale  equiangular  grouping,  (b)  The  reconstruction  using  the  planarity  enforcii^ 


MSDA  principle  applied  to  all  angles. 

By  dianging  die  definition  of  the  SDA  term  to  be  the 
stun  of  the  standgid  deviation  the  angles  in  each 
equiangular  groiq)  (weighted  by  die  number  cd  angles 
in  diat  group),  we  have  imi»oved  the  reoonstrucdon  of 
diese  two  objects  considerably.  Defining  a  sinqile,  yet 
robust,  set  of  rules  that  can  auttnaatically  determine 
die  equiangular  groups  fin  a  line  drawing,  as  we  did 
fiir  die  planar  fines  of  the  line  drawings  in  this  pqier, 
is  still  an  open  questkm.  A  sinqile  rule  is  to  grotqi 
togedm'aU  angles  dut  are  a  part  of  a  convex  fitce.  This 
is  iDustiated  in  figure  12.  Hk  reconstruction  is  accurate 
to  3  d^rees,  whoeas  using  the  SDA  over  all  angles 
gives  a  relatively  poor  reconstruction. 

A  second  direction  diat  we  plan  on  esqiloring  is  to 
in^enaent  a  prqnocessing  stq>  dut  would  take  a  rough 
sketdi  and  enfiirce  various  constraints  in  2D,  such  as 
(1)  parallelism  between  designated  pairs  of  lines,  or  be¬ 
tween  designated  lines  and  axes;  (2)  equally  in  length 
between  designated  lines,  or  between  lines  and  fixed 
lengdis;  and  so  finth.  The  paradigm  would  be  similar 
to  die  one  fiir  the  interpteiation  of  die  line  drawing, 
namely  some  set  df  rules  would  be  used  to  determine 
adiidi  lines  should  be  parallel  or  of  equal  loigth  (widi 
outside  intervention  always  possible),  and  an  optimiza¬ 
tion  slqi  would  dien  enfiirce  the  constraints  wttile  mov¬ 
ing  as  little  as  possible  from  die  original  line  drawing. 
The  ideal  is  to  be  able  to  do  as  much  ai  this  as  pos¬ 
sible  widiout  intervention  fiir  an  interactive  user. 

A  diird  direction  is  to  esqilore  die  relationship  be¬ 
tween  v^iat  we  have  done  and  previous  work  in  under- 
standirig  the  3D  shrqie  of  curves,  such  as  (Barrow  & 
Ibnenbaum  1981;  Stevens  1981;  Witkin  1981;  Barnard 


St  Pentland  1983;  Malik  St  Maydan  1989;  Pentland  St 
Kuo  1990). 

An  intriguing  relatimiship  between  Barrow  and 
Ibnenbaum’s  work  on  single  curves  and  our  work  on 
planar  fitees  is  as  fiillows.  They  defined  die  problem 

intmpreting  curves  in  a  manner  similar  to  die  way 
diat  we  and  Marlll  did;  by  dining  an  objective  func¬ 
tion  over  die  z  coordiiuues  of  the  object  minimiz¬ 

ing  that  objective  function  using  a  descent  algmridun.’^ 
Tbeir  objective  function  was  die  integral  of  tte  change 
in  curvature  squared  plus  the  torsion  squared.  Thus, 
an  ideal  curve  fiir  their  objective  function  is  a  planar 
circle,  since  both  terms  in  the  int^ral  are  dien  zero 
everywiieie  (when  die  end-points  are  removed  fiom  die 
int^ral,  die  arc  (rfa  planar  circle  is  also  an  ideal  curve 
fiir  their  objective  fiinction).  Analogously,  one  of  the 
ideal  curves  fiir  our  definition  is  a  r^ular  planar 
polygon  (or  an  arc  of  a  r^ular  planar  polygon),  since 
then  both  the  SDA  and  DP  are  zero.  Thus,  the  similar¬ 
ities  are  that  the  SDi\  plsys  a  role  .rimilar  to  the  integral 
of  squared  change  in  curvature,  and  die  DP  plsys  a  role 
similar  to  the  int^ral  of  squared  torsion.  Some  of  die 
diffinences  are  that  both  ^  SDA  and  DPI  terms  are 
global  measures  of  symmetry  and  plaiuuity,  while  die 
curvilinear  measures  are  int^rals  of  local  measures. 
A  second  difference  is  diat  the  SDA  is  also  zero  fiir 
some  nonr^ular  and  even  nonconvex  polygons. 

Pentland  and  Kuo  (199)  af^lied  Barrow  and  Tenen- 
baum’s  idea  to  distincdy  noqilanar  curves  and  suifeces 
by  leaving  out  die  torsion  component.  It  is  somewhat 
surprising  that  diis  worked  since  both  Barrow  and 
Tenenbaum’s  and  our  own  esqierience  indicate  that 


An  (^nbniztUion-Based  A/^madt  to  tite  Interpretati<m  of  Single  Line  Drawings  as  3D  Wire  Frames  131 


planarity  is  a  key  ingredient  in  making  die  opdmiza- 
tKHi  approach  worie.  We  will  explore  diis  question  in 
die  near  future. 

Finally,  we  would  like  to  find  some  conqmtational* 
ly  elective  procedure  for  using  the  rigidity  under  rota¬ 
tion  criterion  in  die  3D  recovery  process,  tadier  than 
as  a  final  dieck  on  proposed  solutions. 


6  Discuasion 

Tnditioiud  blodcs-world  i»oblems  are  madiemadcal  in 
nature,  they  deal  with  issues  existence  and  consis¬ 
tent  based  stricdy  (m  geometric  considerations;  they 
make  no  reference  to  what  pet^le  actually  see.  The 
prol^m  defined  by  Marill  is  psy^logical;  since  evoy 
line  drawing  has  an  infinite  number  of  noadiematically 
valid  ordiognqdiic  extmsions  and  no  invalid  ones,  on 
v^iat  basis  does  the  HVS  select  a  particular  extension 
as  being  psydiologically  accqitable?  Marill  proposed 
an  intriguingly  sinqile  critokm  for  duplicating  human 
preference,  but  we  have  shown  diat,  while  it  ofim  pro¬ 
duces  an  accqitaMe  answer,  it  is  unreliable  even  in  very 
simple  situations. 

Marill’s  work  has  similarities  to  the  Huffinan- 
Ckwes-^tz  approadi  that  focused  on  how  polyhedtal 
vertexes  can  qipear  in  a  line  drawing  and,  hence,  the 
cmistraints  such  vertexes  inqiose  on  die  inqilied  3D 
model;  Marill  considers  only  die  constraints  implied 
by  line  intersections  at  ^ledfied  vertexes  in  the  line 
drawing.  Mackworth,  Kuade,  and  Sugihara  found  it 
necessary  to  introduce  constraints  based  on  die  explicit 
assignment  cS  vertexes  to  planar  fines.  We  show  here 
the  need  fin  introducing  a  similar  explicit  requirement 
for  planarity  (actually,  in  die  context  of  optimizing  an 
obj^ve  fiinction,  our  constraint  is  soft  in  that  it  can 
be  violated).  However,  in  our  case,  die  requirement  for 
planarity  is  justified  on  ptychological  grounds  rather 
dian  as  a  means  fiir  achieving  a  geometrically  more 
conqietent  algorithm. 

llie  preference  of  the  HVS  to  interpret  a  line  draw¬ 
ing  as  the  most  symmetric  polyhedral  (planar-feced) 
object  consistent  with  the  drawing  is  well  established 
in  the  ptychological  litoature.  Marill  appeared  to  have 
discovered  a  sinqile  computational  procedure  for  find¬ 
ing  such  solutions  for  any  given  Um  drawing,  but  on 
closer  examination,  it  became  apparent  that  his  MSDA 
prindde  does  not  enfince  (or  evra  prefer)  planar  solu¬ 
tions."  Because  of  diis  deficiency,  MSDA  is  unreliable 
even  in  very  sinqile  situations.  Ws  were  able  to  proi« 


(appoidix  D)  that  if  a  planarity  preference  is  eiqilicidy 
added  to  the  MSDA  ^jective  function,  then  indeed, 
the  nonobvious  preference  for  tymmetric  scdutions  is 
also  present.  However,  we  are  now  finced  to  address 
the  problem  of  how  to  provide  die  auxiliary  informa¬ 
tion  necessary  to  partition  the  drawing  into  ^  afoerrat 
con^ionents  corre^ionding  to  the  3D  planar  feces.  It 
appears  diat  the  HVS  selects  some  subset  of  the  con¬ 
tours  in  the  line  drawing  as  coneqionding  to  the  planar 
feces  in  the  3D  model,  and  if  we  do  not  supply  this 
information  to  a  recovery  algoridim  (either  explicidy 
or  by  providing  a  set  of  conditions  implying  the  same 
information),  we  wiU  fiiil  to  recover  psydiologically  ac- 
cqitable  models. 

Most  of  die  woric  in  the  blocks^vorld  tradition 
employed  perfect  labeled  line  drawings  with  die  assign¬ 
ment  of  vertexes  to  fiu:es  given  as  part  of  die  iiqnit 
qiecifications.  If  we  follow  the  same  qqiroach  (al- 
t^gh  we  are  not  concerned  with  having  perfect  line 
drawings  since  our  recovoy  method  employs  optimiza¬ 
tion,  whidi  can  toleiate  deviations  from  any  (tf  die  con¬ 
straints  embodied  in  the  objective  function),  dien'we 
at  least  have  provided  a  tool  for  sinqilifyiiig  man- 
machine  communication  using  the  language  of  line 
drawings.  However,  there  is  obvious  theoretical  value 
in  understanding  the  criterion  for  human  selection  of 
die  circuits  in  the  line  drawing  that  correqiond  to  planar 
feces  in  die  3D  model.**  In  part,  diis  inqionance  is 
related  to  the  issue  of  how  die  HVS  recovers  the  sh^ 
oi  a  moving  object.  Even  diougb  there  are  a  few  well- 
known  excqitions,  it  is  widely  believed  that  the  HVS 
will  assume  an  object  to  be  rigid  and  cortecdy  recover 
its  shqie  if  diis  is  indeed  the  case.*’  However,  die  rigid 
wire  fiames  with  nmqilanar  feces  provide  a  whole  class 
of  counter-exanqiles  to  this  belief— they  iqqiear  to  be 
nonrigid  when  observed  in  motion  (even  at  very  low 
speeds  where  maintaining  correspondence  of  vertexes 
tom  one  projection  to  the  next  is  no  problem).  The 
nonrigidity  appears  to  result  tom  the  HVS  making  in¬ 
correct  decisions  about  how  the  drawing  can  be  parti¬ 
tioned  into  planar  feces  (see  appendix  E). 


7  Sammary 

Marill’s  recendy  published  p^r  claimed  that  the 
simple  procedure  he  described  could  duplicate  human 
jud^ent  in  recovering  the  3D  wire  frame  geometry 
of  objects  depicted  in  line  drawings.  He  provided  some 
impressive  examples,  but  no  dieoretical  justification  to 


132  Lecltrc  and  FtscMar 


back  his  dums.  In  this  article,  we  critically  examined 
the  merits  of  Marill’s  algorUun,  provided  at  least  a  par¬ 
tial  eiqiianation  for  its  competence,  identified  w^- 
neases,  diawed  how  it  could  be  improved,  and  discussed 
die  inqdkations  of  this  wmk  for  clarifying  scmie  im¬ 
portant  problems  in  human  percqition. 

In  particular,  we  provided  a  nunfoer  of  theorems  that 
show  that  fninhniMtig  die  Standard  deviadcm  ci  angles 
is  (potentially)  a  sinqde  and  efifective  method  for  selec- 
tiitg  tymmriiic  solutions  when  die  amstiainiiig  line 
drawing  (vriiidi  is  die  projection  of  a  wire  frame  that 
may  be  inoonqdetB)  permits  sudh  interpretation.  On  die 
odin  hand,  we  showed  diat  Marill’s  algoridun  could 
foil  in  sinqile  cases,  dut  he  roqiloyed  an  (^Mimization 
procedure  that  was  often  too  to  find  die  correct 
answer  even  when  it  was  widiin  die  cmrqietence  of  the 
objective  function,  and  that  the  algorithm  would  (dten 
produce  wire  frames  with  ncmplanar  foces  (something 
no  human  would  intuitively  accqit  in  perceiving  a 
straight-line  drawing  as  a  3D  omfiguration). 

We  argued  diat  an  impmtant  condition  in  testing  or 
ewduatirtg  the  ptychological  idausihilify  of  a  reconstruc¬ 
tion  is  that  its  tqnojectknis  (after  spatial  relocation) 
result  in  the  sanne  object  being  produced  by  die  recovery 
algoridun.  For  die  human  visual  system,  this  is  equiv- 
aloit  to  the  condition  dut  the  recovered  object  qipear 
rigid  when  observed  duririginoveoxnttu’ rotation.  The 
percqition  of  rigidify  for  wire  fiames  rqipears  to  be 
highly  correlated  with  die  presence  or  absence  of 
strongly  nonplarur  foces.  By  modifying  Marill’s  ob¬ 
jective  function  to  explicitly  fonxir  planar-foced  solu- 
tkms,  and  by  using  a  more  competent  optiniization 
technique,  we  were  able  to  dancmstrate  significandy 
inqiroved  performance  in  all  of  the  exanqiles  Marill 
provided  as  wdD  as  diose  additional  ones  we  constructed 
ourselves.  The  robustness  of  our  algorithm  was  demon- 
strated  by  obtaining  cmisistrat  psydiologically  plaus- 
iUe  reconstructions  in  hundreds  of  experiments  invcdv- 
ing  variations  in  viewpoint  and  initial  conditions  for 
the  approximately  20  objects  in  our  database. 

Acknovricdgcmeiits 

The  work  reported  here  was  partially  supported  by  the 
Defense  Adwmced  Researdi  Projects  Agency.  Vfe  grate- 
fiilly  acknowled^  die  valuable  discussions  with  Aaron 
Bobidc  and  Thomas  Strat  r^arding  b(^  the  content 
and  mganization  of  diis  article.  A  number  cd  improve¬ 
ments  and  clarifications  in  its  final  version  were  sug¬ 
gested  by  Thomas  Marill  in  a  private  communication. 


Notes 

1.  Ofadieiit  space,  origiiiaUy  conceived  of  fay  James  Clerk  Max¬ 
well  in  1864  (see  (Whiiely  MS))  and  mdiaccwcmd  fay  DiA.  Huff¬ 
man,  provides  only  necessary  conditions  for  planar  lealiatoil- 
ity  of  general  potybedral  otgeM  with  hidden  lines  removed,  and 
dms  consiaiett  edge  labeling  is  possiUe  fi>r  impoasiUe  bloda 
worid  and  Origami  ofajects.  Fnttfaer,  the  labeiing(reoovery 
algosidmis  were  not  always  coinpelem  to  find  an  edslmg  sohitkm. 

2.  There  were  some  other  problems  of  leaao’ significance  k>r  our 
purposes.  For  cxangtle,  the  algebraic  farmulalion  wus  sensitive 
to  cmnputation  round-off  errors,  and  digitization  errors  in  spec¬ 
ifying  the  line  dnwing;  a  realizaUe  ofagect  could  be  rejected 
because  of  such  minor  mmieric  inaccuracies.  Sugihata  dealt  with 
this  problem  fay  addiitg  an  optimization  stop  to  his  algorithm, 
which  could  fi^  a  feasiUe  reconstructioo  if  the  input  drawing 
was  an  alrrmst  correct  qrecification. 

3.  Marill,  on  the  other  hand,  calls  the  set  of  an  possiUes  the  otlho- 
grqjhic  extension. 

4.  For  siminicity,  the  vertexes  are  rqnesented  by  oob  two  digits 
df  precision  in  the  trMe.  However,  we  used  the  fnU  32-bit  preci¬ 
sion  of  the  projection  in  the  iruemal  representation  used  ^  the 
algorithms. 

5.  Wb  note  that  wliile  there  genctally  can  be  mary  differeu  wrys 
of  covering  a  line  drawing,  thoae  of  blocks-wotldotjects  w^ 
hidden  lines  removed  wiU  be  covered  uniquely  if  rve  demand 
that  the  interior  of  the  2D  circuits  be  fiee  of  ary  lines.  Ufe  also 
note  that  it  is  not  always  possible  to  cover  a  line  drawing  with 
rintpie  dosed  dreuits  corre^xmdirtg  to  the  qwdfied  pfamar  ftces 
of  a  given  orthographic  extension  (see  exarnple  N).  It  may  also 
be  the  case  that  a  given  covering  has  no  nontrivial  orthognqthk 
extnsion  with  planar  bees  as  specified,  as  in  example  & 

6.  Otto  fitoe,  in  exartqde  H,  is  an  excqition  to  this  statement. 
However,  there  are  enough  other  geometric  constraints  in  this 
particttlar  case  to  enfince  planarity- 

7.  Because  this  rule  bpsnUypttxivcesntaity  additional  planar  ftces, 
it  was  not  used  in  figures  2  through  6.  For  these  line  drawings, 
the  results  are  virtually  idemical  with  or  without  these  additional 
jdanar  bees.  However,  the  rule  was  used  in  the  strdnlity  and 
robustness  eiqreriineatts  of  section  3.4. 

8.  The  SDA  term  is  first  squared  to  make  it  commensurate  with 
the  DP  term.  Note  that  squaring  the  SDA  term  has  no  effect  on 
ihemitBmizationvriienX  -  1  (i.e.,  the  simple  MSDX  algorithm), 
because  the  SDA  term  is  positive,  and  squaring  is  a  monotonic 
function  of  the  pontive  reals. 

9.  This  assumes  the  line  drawing  is  pofect.  Wb  later  discuss  how 
such  perfect  drawirigs  can  be  obtained  in  an  imeractive 
ertvirorunent. 

10.  Since  we  had  only  the  original  line  drawing  for  each  (rf  Marill’s 
exaznples,  we  us^  the  reconstruction  fiom  each  litK  drawing 
at  foe  3D  object  for  the  random  projections. 

11.  The  lirte  lengths  for  foese  drawings  were  tqrptoximately  in  the 
range  of  2  to  S. 

12.  For  all  line  drawings  except  the  truncated  box  and  foe  binge,  the 
largest  absolute  difference  in  artgles  between  arty  trial  and  foe 
reconstruction  wifoz  ^  Owes  less  than  one  dqtee.  For  the  trun¬ 
cated  box,  foe  largest  error  was  less  than  fifteen  degrees.  For 
foe  binge,  one  of  the  trials  caused  foe  hirtge  to  “ftrid”  wifoarc- 
pairsO  0  4)and(2  3  3)goingtozerod^rees. Otherwise, the 
largest  error  was  less  than  seven  degrees. 

13.  Modulo  a  dunge  in  sign  in  the  z  coordinates. 


An  OpHmization-Bated  Approach  to  the  Interpretation  of  Single  Line  Drawings  as  3D  Mre  Fhunes  133 


14.  The  Ml  of  ftw  vanam  it  Iqr  no  weem  For  emiqile, 

■mrihne  wiians  fiom  CM  hegafOB  pint  aagf  wnex  from  die 
odwr  haafoii  will  do  fcr  tliii  liie  drawiag.  Vfr  how  a  unple 
alpvidm  fcr  ftidim  a  M  of  fro*  wMM,  feat  hMB  not  yet  pnweD 
that  it  iaooivaet,  ao  we  do  act  pieaeat  it  hen. 

15.  The  aoooaaah*  MooaMroctiooa  are  aot  haiapHtiift;  to  the  a- 
teat  that  they  allow  a  lape  of  ialeipnlalioaB.  the  parameten 
lelaclad  fcr  oae  haatpwtatioB  will  iafliiffBre  the  paiiBieier  lelec* 

rinfif  fgf  lu^^^yaiaf 

16.  Some  dtffcraaeea  are  that  Barrow  aad  Tfeaeabaum  oooiidefed 
athhiaiy.  hat  hioiiw|ienpectt»Btraadbnaaia  their  paper,  while 
Marill  oaed  ooly  orthognphic  piojectioaa.  In  eidier  caae,  die 
aet  of  Me  variate  ia  equhakoL  hi  addhiaa,  Barrow  and  Tfenea- 
banm  did  not  mnaiiler  the  uae  of  a  conrinnation  method. 

17.  Marin,  of  conrae,  only  redina  the  wire  frame.  But  in  the  caae 
of  a  Mocka-worid  otgem,  oompoteat  algoriduBa  cxiat  for  finding 
aU  the  valid  oonqileifciM  of  foe  wire  frame  u  a  aoiid  ptdyhedial 
ofeject  (Stiat  1984,  Marhowaky  aad  Bfeaky  1981). 

18.  Aa  noted  in  aectioo  3l1  ud  appwitlia  A,  we  have  made  aome 
initial  progreaa  toward  the  aolntion  of  due  proten  and  hawe 
developod  an  algorMnnic  procedure  diat  can  aneeeaafuUy  han¬ 
dle  afi  of  the  enmplea  diacaaaed  in  dda  paper,  but  we  recognize 
that  diia  ia  atfll  fin  ahort  of  a  complelB  aoIntioB. 

19.  For  example,  tp  naieg  UUnaui'a  reanb  that  doee  diatinct  or- 
thpgiaphieprojectioBaflffouTBoneoplanarpoinlainaiigidcoB- 
flgimte  are  auffickat  to  nniqndy  deleimiae  the  atnictuie  and 
motioa  up  to  a  reflection  about  the  image  plane. 


Reierenoes 

Barnard,  S.T.,  and  Fendaad,  A.P.,  1983.  Thiee-dimenaional  ahape 
from  line  dmwinga,  Pmc.  8lh  buem.  Joint  Coof.  Artif.  bttelL, 
Kaitoidie,  W.  Oermany. 

Barrow,  HjG.,  and  Thneabaum,  ).M.,  1981.  Interpreting  line  draw- 
inga  aa  tfaiee-dimentional  aurfccea,  ArtiftcUU  InteUigence 

ntyst'ts-vb. 

Ckma,  M.E,  1971.  On  aeeing  dnagi,  Artifidal  Intelligemx 
20)^116 

Diapet;S.W.,  198L  The  uae  of  gradient  and  dual  apace  in  ime-dnnving 
interpretatkm,  An^icial  baelUgence  17:461-508. 

Hochb^,  J.,  and  McAlialer,  E.,  1933.  A  quantitative  approach  to 
figure  '^pMdneaa.”  /  Eqt.  PsydtoL  46:361-364. 

Huffinan,  D.  A.  1971.  InqxMcible  ofegecta  aa  nonaenae  aentences.  In 
Mdtzer  and  Midne,  ed.,  MacMne  baelUgence  Vd.  p.  29S-323, 
Edihbuigh  Univerrity  Preaa. 

Kanade,T.,  1980l  A  theory  of  Origami  world,  Arr(/lcta//n(rWgeRce 
13a):279-3U. 

Lederc,  YO.  1989.  Conatructing  rinqde  atable  deacrqitions  for  im¬ 
age  paititioniag.  baem.  J.  Comput.  Vis.  3(l).73-ira. 

Mackworth,  A.K.,  1973.  Interpreting  pkturea  of  polyhedral  acenes, 
Arti/icial  baelUgence  4(2):121-137. 

Malik,  J.,  and  Magidan,D.,  1989.  Recovering  three^hmeroiooaldiape 
fiom  a  tingle  image  of  curved  ofajecti,  /EEE  Thom.  Aar.  Anal. 
Mach,  baett.  ll<6):5S5-566 

Marill,  T.,  1991.  Rnailating  the  human  interptelation  of  line-dtawings 
at  threedimeaaional  otgecta,  baema.  J.  Comput  Vis.  6(2):147-161. 

Marfcowaky,  M.A.,  and  Rfeahy,  O.,  1981.  Fleming  out  projections, 
nuej.  Res.  Develop.  25(6):934-9S4. 


tadand.  A.,  and  Kiio,  J.,  ]990i  Three-dinieaBionnl  line  inteipreta- 
tioB  via  local  prnrmaing,  Eleanmic  bmaging  1249-30  Alto  Media 
Lab  Tbchnical  Report  Dl. 

Bnmenmtt,  J.R.,  and  Kubovy,  M.,  1981.  FeroqMual  organiiition: 
an  overview.  In  Arreqpoia/  Organizniion  pp.  423-4S0  Lawrence 
Erlbanm  Aaaocialea;  Hilladale,  NJ. 

Preaa,  W.H.,  Flanaery,  EP.,  Tbukxdaky,  EA.,  and  Metlerlii|g> 

IBbU  Numerical  recipes,  the  art  cfsOentpc  eonfmiing,  Ctabnigp 
Univeraily  Preaa,  Cambridge. 

Steveaa,  K.A.,  1981.  The  viaual  interpretation  of  aurfcce  contoun, 
Angidal  baelUgence  I7Kl-3):47-74. 

Stiat,  T.M.,  1984.  Spatial  reaaoning  from  line  drawinga  of  polyhedra, 
Pmc.  DAMH  bnage  Understanding  Vbrkdwp,  New  Orteana,  p. 
230-23S. 

Si^ihara,  K.,  1982.  Mathematical  atiuctures  of  line  drawinp  of 
pdyhedrons— toward  man-machine  communication  by  meant  of 
line diawingi,  /EEC Thau.  Pott  Anal  Mach.  badL,4{5):A5b-4(9. 

Sugihara,  K.,  1984.  A  necessary  and  aufficiett  condition  for  a  pic¬ 
ture  to  repieaent  a  polyhedral  scene. /EEC  Thaia  Am  Ann/ 
baeU.,  6(S):S78-S86. 

Rhltz,D.A.,  1972.  Oenenting  semantic  deaciiptiom  fiom  line  draw- 
ings  of  scenes  with  shadows.  Tbchnical  Report  AI-TR-271,  MIT. 

Whhely,  W.,  198&  T\m>  algorithms  for  polyhedral  pictures,  Pmc. 
2nd  Annual  Symp.  Computa.  Geometry,  p.  142-149. 

Batkin,  A.W.,  198L  Recowring  aurfcce  slope  aad  orientation  from 
texture,  Art^dal  baelUgence  17:17-47. 

Rfilkm,  A.W.,  Tbrxopouloa,  IX,  and  Kass,  M.,  1987.  Sgnal  matching 
through  scale  space,  baerrt  J..  Comput.  Pli.  1:133-144. 


Ainpendix  A.  I^vhologkal  AssomptkMis 

Tlie  following  are  stmie  of  die  basic  assumptioas  dial 
we  believe  are  typically  made  by  people  in  die  rectm- 
stnictions  of  wire  frames  fiom  line  drawings,  and  some 
constraints  relevant  to  partitioning  a  line  drawing  into 
planar  free.  They  are  known  to  have  rare  exceptions. 

1.  Three  dimoisional  wire  frames,  derivui  frtnn  line 
drawings,  have  inqilied  planar  frees  inside  subsets 
of  their  closed  circuits;  they  can  also  have  struts, 
such  as  legs  or  bracing  wires,  in  or  on  a  planar  free. 
(Strongly  noiqilanar  frees  produce  psychologically 
implausible  solutitms.) 

2.  Symmetric  reconstructions  are  pr^srred  over  non- 
symmetric  ones. 

3.  Itollel  lines  in  a  line  drawing  are  parallel  in  space. 
Lines  connecting  vertexes  frlling  on  two  parallel 
lines  are  in  a  common  plane  widi  die  two  parallel 
lines. 

4.  Mai^r-sided  convex  closed  contours  without  inter¬ 
nal  circuits  (in  a  2D  line  drawing)  are  likely  to  cor¬ 
respond  io  the  contours  of  planar  frees  in  the  cor¬ 
responding  3D  ordiognqihic  extension  (see  B4).  An 


134  Itchn  and  Ftsctder 


ioiHiwl  drcait  to  a  convex  iK^ygon  is  defined  to  be 
a  drcuH  fcr  vvlikli  all  die  vertexes  ate  iniemal  to 
die  {xdygon,  and  for  vfiudi  the  ends  of  die  eixeuit 
lie  on  nonadjacent  vertexes  of  die  polygon. 

S.  A  closed  sinqde  ccmtoiir  in  a  line  drawing,  without 
internal  lines,  correqxinds  to  a  planar  foce  in  the 
correqiondhig  3D  tecoostmedon. 

An  algixithniic  procedure  for  kkntifyiiig  3D  phmar 
foces  in  the  OMTe^xniding  2D  line  drawing  of  a  wire 
fiame  has  been  constructed  by  ctnaposiiig  the  tequire- 
meats  of  items  3, 4,  and  S  into  a  single  algorithm,  as 
defined  in  section  3.  That  procedure  is  sufficient  to  deal 
all  of  the  exanqdes  we  discuss  here,  but  is  not 
genraal  enou^  to  handle  other  cases  we  can  think  of  . 

Appendix  B.  Prcjective  bvaitants 

The  followiiig  are  some  iiiqiortaiit  projective  iiivariaiils 
imr  planar  geometric  structures. 

1.  The  sum  of  the  imeritM-  aiigles  (measured  betwem 
0  and  360  d^rees)  of  a  closed  {danar  cordour  widi 
n  sides  equals  (n  —  2)  180  degrees.  Thus,  since  a 
pc^gm  of  n  sides  inojects  to  a  polygon  of  n  sides 
under  both  orthogrqihic  md  central  projection,  die 
mean  value  of  die  interior  angles  of  a  given  closed 
pfamar  contour  [(r  ~  2)18(Vr]  is  inrariaiit  undo’ both 
ordiognqihic  and  cmtrel  inojectitm. 

Ws  note  that  Marill  measures  angles  only  in  the  in¬ 
terval  between  0  and  180  dqrees.  lb  die  extent  diat 
we  are  primarily  concerned  widi  equiangular  closed 
coittours  in  the  rqiplication  of  die  above  theorem  in  ex¬ 
plaining  and  using  his  results,  this  discrqiancy  is  ir- 
rdevant  since  all  die  interior  angles  of  su^  cemtours 
are  less  dian  180  d^rees. 

2.  ConsideranangleCtwolines^mentssharitigacom- 
mon  eu^lioint)  in  3D  qMce  and  its  otdiogiiqihic  pro- 
jeetkm.  ^  will  call  ^  plane  containing  die  angle 
the  source  plane,  and  the  plane  containing  its  pro¬ 
jection  die  projeetkm  fdane.  If  die  angle  is  translated 
in  die  source  plane,  its  projeetkm  is  also  translaled, 
but  does  not  diange  in  magnitude  from  its  original 
projected  value.  Now  consider  a  set  of  n  arigles  ly- 
irig  (m  a  cmnnKm  source  plane,  such  that  the  sum 
of  diese  angles  is  360  degrees.  If  it  is  also  the  case 
that  die  angles  can  be  translated  so  diat  when  all 
dieir  vertexes  coincide,  diey  exactly  span  an  angle 
of  360  d^rees,  dien  the  mean  value  the  set  of 
angles  (360/n)  is  unaltered  under  orthogrqdiic  pro¬ 
jections.  fib  wiU  call  sudi  a  collection  of  artgles  a 
“oom|deie-star.”  (Exanqile  C,  for  instance,  contains 


a  (XMxqdeie-star  consisting  of  die  eight  4S-d^ree 
an^  fnmed  at  the  comer  verteaes  by  the  diagonals 
with  the  sides  of  die  square.  Example  G  emtains 
diis  same  ctmfigunnioo  in  hs  certoal  plane.)  fib  note 
that  if  an  essentially  infinite  munber  of  copies  of  an 
angle  of  d  d^rees  (where  360/d  »  Jt  arid  k  is  an 
ini^CT)  is  uniformly  distributed  in  orientation  over 
a  {dane,  then  die  mean  value  of  the  angles  under  any 
mdiogr^ihic  fnojection  of  the  {dane  is  the  ctmstant 
value  d. 

3.  fib  note  that  ifdie  angle  between  two  line  segments 
is  less  than  180  decrees,  die  angle  can  be  closed  to 
form  a  triangle,  and  since  triangles  are  preserved 
under  bodi  orthogrqdiic  and  central  projectum,  an 
angle  of  less  than  180  degrees  will  never  transfimn 
under  such  projections  into  one  of  more  dian  180 
degrees,  fib  wiU  call  a  closed  planar  contour  con¬ 
vex  if  die  r^ion  it  bounds  is  convex.  Since  a  con¬ 
vex  contour  has  all  internal  angles  of  less  than  180 
d^rees,  a  convex  {drmar  contour  remains  convex 
under  bodi  ordiograidiic  and  central  projection. 

4.  fib  note  that  die  offoogra|diic  projection  of  an  ar¬ 
bitrary  noiqilanar  polygonal  qiace  curve,  with  four 
or  more  sides,  has  a  probability  of  projectmg  to 
eidier  a  nonsimple  or  concave  curve  with  a  prob- 
riiility  (P)  that  increases  with  the  number  of  skies; 

P  >  1  -  0.5"'^  for  R  i  4 

This  esqnesskm  is  based  on  die  following  model;  Con¬ 
sider  a  process  that  generates  a  diain  of  3D  random 
vectms  by  generatmg  diree  randmn  numbers  for  eadi 
vector  (in  qdieiical  coordiruites,  an  angle  uniformly 
distributed  between  0  and  360  degrees,  a  seermd  angle 
between  0  and  180  degrees,  and  a  Imgth  uniformly 
distributed  betweoi  0  and  some  fixed  int^er  L).  As 
each  vector  is  genoated  we  extend  die  projection  of 
the  devdoping  space  curve  on  die  image  friane.  The 
process  stops  ato  some  fixed  number  of  stqis,  which 
is  determined  by  dMosing  a  random  nunibm^  in  some 
given  range;  die  curve  is  now  closed  by  connectiiig  the 
starting  point,  which  could  be  the  origin  of  die  X-Y 
plane,  to  die  last  point  generated  and  diis  detmmines 
wfaedier  the  inside  is  to  the  left  or  right  as  we  follow 
the  chain  of  edges  of  the  projected  polygon,  fib  note 
that  die  (mly  relevant  foctor  in  whedier  the  projected 
closed  contour  is  convex  or  concave  is  the  (ylindrical 
angle  giving  die  rotation  of  each  of  die  random  vec¬ 
tors  relative  to  the  X  axis  in  the  image  plane.  For  more 
than  diree  sides,  diere  is  a  S0%  probability  at  each 
vertex  diat  die  inside  angle  is  greater  dian  180  degrees. 


An  OptinUmHoHrBased  ^Rproadi  to  the  interpretation  of  Single  Line  Drawings  as  3D  Wire  Frames  13S 


wliichtbiispiodiioesaooiicavep(dygon(d>elastckM- 
ing  side  can  be  ignorad  since  it  does  not  hive  die  same 
statistics  as  die  other  edges  in  our  random  model.) 
Odwr  fMobatdlistic  models  would  give  nomdeadcal,  but 
similar  results.  The  >  oonditkmisbaaedonadditimial 
consideiations,  sudi  as  the  projected  curve  intersec¬ 
ting  itself  even  though  the  ii^ut  specification  does  not 
record  a  vertex  at  the  crosaiwint. 

5.  Closed  four-sided  polygi^  qiace  curves  with 
90-degree  angles  at  each  vertex  are  fdanar  contours. 
1b  prove  dus  assertion,  let  the  sequence  of  vertexes 
be  labetod  a,  b,  c,  and  d.  Let  die  plane  containing 
lines  and  1^  (and  dius  vertexes  a,  b.  and  c)  be 
caitedFi.  Since  all  angles  are  90  degrees,!^  must 
lie  in  a  plane  (JP^  normal  to  ate.  Similarly, 
must  lie  in  a  fdane  (P^)  normal  to  at  a.  ^fertex 
d  must  thoi  Ifo  <m  die  line  (La  of  intaseetkm  of  P2 
and  Pi,  whidi  is  iKwmal  to  P].  Wi  know  one  sohi- 
tkm  is  to  locale  d  at  the  pc^  of  intnsection  (rf) 
of  La  and  Pi  (where  a,  b,  c,  and  d*  form  a  rec- 
tan^).  This  is  die  planar  sohitkm  and  we  wish  to 
show  that  no  odier  sohtdon  is  possible.  We  note  dux 
a  second  constraint  on  the  location  of  d  is  diat  it  must 
lie  on  a  qihere  widi  diametm  ac  (i.e.,  all  ri{^ 
angles,  with  legs  passing  through  points  u  and  c, 
must  be  inscribed  angles  of  dicles  dirough  a  and 
c  with  diametre  ac).  Vfe  know  d*  lies  on  the  qihere 
and  Pi  is  a  bisecting  plane  of  the  ^hoe.  Thus  La 

*  is  tangent  to  die  ^ere  at  d*  and  d*  is  die  only 
possiUe  sohitkm. 

6.  A  Global  Planarity  Tkst  for  a  Space  Curve.  A 
planar  polygcmal  curve  has  a  sum  (rf  internal  angles 
equal  to  (n  -  2)180  d^rees.  Thus,  if  die  curve  is 
triangulated  using  only  die  existing  vertexes  along 
the  curve,  die  sum  die  angles  the  trinities  is 
also  (It  -  2)180. 

Case  1:  Ctmsider  a  space  curve  5  that  inojects  to 
a  convex  (dinar  curve.  If  the  ^Mce  curve  is  itself  planar, 
the  sum  of  its  ai^es  (measured  between  0  and  180 
degrees)  is  (n  -  2)18(X  Assume  5  is  wmplanar,  that 
is,  diere  is  a  ‘‘fold”  along  one  or  more  ed^  of  srmte 
triangulation  of  its  planar  (nojection.  Consider  die 
vertex  K  at  die  interseetkm  of  one  sudi  fold  (with 
reflect  to  die  inqdied  triangulation  7)  and  5.  The  plane 
duoo^  the  two  edges  of  5  meeting  at  V,  and  dw  &ces 
of  the  triangles  of  7  that  have  edges  intersecting  at  V, 
form  a  p(d3diedtal  angle.  It  is  known  that  any  fime  angle 
of  a  polyhedral  angle  is  less  than  the  sum  the  other 
face  angles.  Therefore,  die  sum  of  the  angles  of  the 
space  curve  is  equal  (at  vertexes  widi  no  folding)  or 
1m  (at  vertexes  v^  fidding)  dian  the  sum  of  the  ai^es 


of  the  triangles  in  T  (i.e.,  less  than  (n  -  2)180). 

Caae2:  If  the  (nojection  of  the  q»oe  curve  5  is  con¬ 
cave,  and  we  measure  aisles  between  0  and  360 
degrees,  the  sum  of  die  internal  contour  angles  in  die 
planar  projection  will  equal  (n  -  2)180  m  in  Case  1. 
However,  while  die  space  aigles  with  (nojections  of 
less  than  180  degrees  will  decrease  at  fd^  the  internal 
angles  greater  dian  180  degrees  will  increase  (i.e.,  at 
vertexes  where  there  are  folds,  die  polyhedral  an^  in 
the  argument  given  in  Case  1  is  now  kmned  for  die 
extemo/ angle  of  5  at  V).  Thus,  since  some  angles  will 
increase  and  others  decrease,  we  cannot  be  sure  that 
the  curve  is  planar  even  if  die  sum  of  its  internal  angles 
equals  (n  -  2)1801  Howevor,  we  do  have  a  sufiBcient 
condition  for  noiqdanarity.  That  is,  the  curve  is  known 
to  be  mmplanar  if  the  sum  of  its  internal  angles,  meas¬ 
ured  between  0  and  360  d^rees,  is  not  e^ial  to  (n  - 
2)180  dq;tee8. 

Appendix  C  A  IhiHlion  Theorem 

The  variance  of  a  set  of  S  of  n  objects  {oi}  is  defined 

as 

y~-Z(ai-Aff^  -  3f 2 

»  1-1  . 

vdiete 

M^lSa.. 

n  i-i 

Let  us  now  (Murtition  the  {aj  into  k  subsets,  sudi 
that  die  subset  ^  has  elemcmts  and  mean  Mj  vdiere: 

Mj  —  —  2 
nj  8, 

Let  1^  be  die  variance  of  Sj  about  Mj  and  let  Aj  = 

(M  -  Mj). 

Thewem: 

1  * 

K  =  i  S  Af] 

Proof:  The  ex()ression  for  V  can  be  rewritten  as 

y  =  -\  +  Ai)i" + “  (^2  +  ^2)1' 

« Ls,  Sj 

+  -  +  S  [«.-  -  1 


136  LtdtK  tmd  Ftackkr 


wBlei: 

% 

Ita  wBhMe: 

nj  nj  Hj 

•h 

Given  liMt  £(41^  *  A^,  we  note  that  6ie  4di  and  6di 
lenm  cancel  nd  die  2^  and  Sth  tenns  combine: 

H.  + 

Hj  L  tlf  J 

And 

v;  -  l^lKy  +  4,^ 

QED 

AppanAiIX  a>— iHi'ir  PreiiiTnw  Tlwatem 
lecaO  Ibat 

1.  In  appendix  B  we  thawed  that  the  averafe  angle  of 
aD  planar  ordiogiiphic  cxteniiooa  of  a  ghea  tim- 
pledoaed  2D  oomoor  are  die  tame,  and  that  die 
aveinge  an^  of  ad  plnar  ocdwgiaphic  exieaaions 
of  a  complete-itar  are  alao  die  tame; 

2.  in  appendix  C  we  proved  a  dworem  diat  allows  at 
to  oongmlB  die  SDA  of  a  tet  of  anqde  doted  {danar 
oonionn  (od/or  oonqdetedan)  at  the  tom  of  two 
componenta.  Hie  first  conyonent  it  die  variance  of 
the  anglea  in  a  oontonr  OT  ttar  about  die  mean  ai^ 
of  that  oomoor  or  star,  summed  over  all  oomoors 
and  stars.  Hie  second  oonqwnent  is  a  weighted  sum 
of  the  squared  difierenoes  between  die  mean  angle 
of  eadi  oomoor  and  star,  and  die  average  of  all  the 
angles  under  consideration. 

1^0),  die  second  component  ofthe  variance  is  con- 
slam  over  aD  planar  ordiogcaphic  esdensioas  because  (a) 
the  mean  of  each  contour  aod  star  is  constant  over  aU 
audi  extensions,  and  (b)  die  mean  of  aU  angles  can  be 
computed  m  die  wei^ited  sum  of  the  mean  of  eadi  con¬ 
tour  and  star. 

Conaeqnendy,  if  we  restrict  our  attention  to  the 
planar  oithognqihic  extensions  of  a  line  drawing,  then 
by  (2)  above,  ody  the  first  conqionait  cf  the  variance 
win  dtange  over  die  extensions.  Since  the  first  com- 
ponem  is  rero  fiv  an  extension  comprising  only  equi¬ 


angular  planar  contouis  and  stan  (such  as  the  sdudom 
fx*  exaiqdes  A,  B,  C,  G,  J,  K,  and  L),  utd  since  it 
is  positive  othmwise,  dum  audi  qrmmetric  sdutkms 

CMreynid  to  the  glohal  minitniim  of  the  STU  nwer  all 
{danar  mthographic  exiensioas. 

Appendix  E.  fhclon  Alibcthig  the  IVrcepthm  of 
Noauigidity 

If  we  rotate  a  randbmfy  derived  orthognphic  extension 
of  almost  any  of  the  line  drawings  used  as  examides 
in  diis  articte,  the  object  ^ipears  nonrigid  to  most 
observers  (even  thou^,  of  course,  the  wire  frame  is 
actually  a  rigid  object).  While  there  are  many  possible 
cxplMiatinnsfcr  tins  phenomenon,  our  coiyecture  is  that 
it  is  primarily  due  to  qiecial  position  prqjectkms  of  die 
wireframeCdiatoocnratooeormoreposesinhsrotB- 
tkm)  that  lead  die  HVS  to  incorrect^  assume  that  soooe 
projective  invariaitt  (sudi  as  paralld  lines,  see  figure 
11)  is  being  observed.  Hiis,  in  turn,  causes  incmrect 
expectations  about  the  presence  and  locadoD  of  idanar 

IKfe  infxmally  lodoed  at  some  other  possiUe  causa¬ 
tive  fiKmxs,  but  did  not  obsorve  consisiem  nonrigidity 
phenomena.  For  exanqde,  we  lodoed  at  objects,  such 
as  examide  N  diat  pro^ce  compeUing  3D  interpreta- 
tioos  wiA  Necker  reversals,  but  for  which  the  drawing 
is  inconpieiB— it  does  not  show  aH  the  edges  diet  should 
be  visD^  fix  examfde,  vriiere  {danar  frees  intersect. 
Hiere  was  die  possibility  that  diese  missing  edges  in 
the  3D  model  (and  thus  missing  lines  in  the  drawing) 
could  cause  die  appearance  of  a  noigdanar-froed  ob¬ 
ject  to  be  observed.  But  the  hinge,  and  die  few  other 
objects  we  lotdced  at  in  this  cal^ory,  qpeared  rigid. 

^  also  looind  at  nonplanar  orthognphic  extensions 
of  drawings  diat  gmerally  qpeared  flat,  including 
bkxJoHvorldtypediawingsthatdonothaveocxiespoa- 
dingpotyhed!alrealizatiQns(8udiasexamideO).  Hie 
results  here  were  ambiguous.  Hie  rotating  objects 
generally  {xoduced  fllusions  of  nonrigidity,  but  since 
diese  ot^e^  did  not  always  ippear  3D,  ^  iUusions 
were  genoally  very  weak. 

Some  other  cauml  esqieriments  include  cases  where 
aU  the  lines  connecting  the  vertexes  of  die  wire  frames 
are  deleted;  we  obsmved  that  some  of  die  wire  frames 
dial  originaUy  qpeared  n(mri{^  now  qpeared  to  be 
rigid  under  rotatkm.  And,  as  a  genoal  obnxvadon,  we 
have  not  oicoantered  any  exanples  in  vriiich  the  wire 
frame  of  a  (nond^eneiate)  bloda-workl  object  qpears 
nonrigid  whmi  in  motkm. 


Appendix  B: 

Sifilettey  Detectian  and  Partitioning 

Planar  Curves 


Saliency  Detection  and  Partitioning  Plainar  Curves* 

Martin  A.  Fischler  and  Helen  C.  Wolf 

Artificial  Intelligence  Center 
SRI  International 

333  Ravenswood  Ave.,  Menlo  Park,  CA  94025 
(iiachlerOai.sri.coin  wolfOu.sri.com) 


Abstract 

This  paper  summarises  the  underlying  ideas  and  algo¬ 
rithmic  details  of  a  computer  program  that  performs  at 
a  human  level  of  competence  for  a  significant  subset  of 
the  carve  partitieninf  task.  It  extends  and  'bounds  out” 
the  technique  and  philosophical  approach  originally  pre¬ 
sented  in  a  1986  paper  by  Fischler  and  BoUes.  In  par¬ 
ticular,  it  provida  a  unified  strategy  for  selecting  and 
dealing  with  interactions  between  salient  points,  even 
when  these  p<^ts  are  salient  at  "different  scales  of  res¬ 
olution.”  Experimental  results  are  described  involving 
on  the  <Hrder  of  1000  real  and  synthetically  generated 
images. 

bidex  Terms:  computer  vision,  salient  points,  critical 
points,  curve  partitioning,  curve  segmentation,  curve  de¬ 
scription 

1.  Introduction 

A  critical  problem  in  machine  vision  is  how  to  break  up 
(partition)  the  perceived  world  into  coherent  or  meaning- 
fill  parts  prior  to  knowing  the  identity  of  these  parts.  Al¬ 
most  all  current  machine  vision  paradigms  require  some 
form  of  partitioning  as  an  early  simplification  step  to 
avoid  having  to  resolve  a  combinatorially  large  number 
of  alternatives  in  the  subsequent  analysis  process.  Given 
this  critical  role  for  partitioning  as  a  fimctional  require¬ 
ment  o[  a  complete  vision  system,  it  is  a  major  challenge 
to  find  some  significant  subset  of  the  partitioning  prob¬ 
lem  for  which  an  algorithmic  procedure  can  duplicate 
normal  human  performance.  This  paper  (a  compressed 
version  of  a  much  longer  document  which  will  appear 
in  IEEE  PAMI  later  this  year)  summarises  the  under¬ 
lying  ideas  and  algorithmic  details  of  a  computer  pro¬ 
gram  which  performs  at  a  human  level  of  competence 
for  a  significant  subset  of  the  evrve  partitioning  task.  It 
extends  and  ‘bounds  out”  the  technique  and  pbiloaopbi- 
cal  approach  originally  presented  in  a  1986  PAMI  paper 
by  Fischler  and  Bolles  [Fischler86].  For  example,  it  pro¬ 
vides  a  unified  strategy  for  resolving  conflicts  in  adecting 

“Ihia  ««rk  was  perforaad  nndar  eootraeU  nipported  by  Uia 
OaleiiM  Advanced  Rasaarcb  Piejacta  Agaacy. 


among  neighboring  potential  partition  points  that  may 
be  salient  at  different  "scales  of  resolution.” 

While  our  focus  in  this  paper  is  on  curve  partitioning 
in  a  generalised  setting  (the  curves  in  our  experiments 
are  mostly  without  semantic  meaning),  and  where  the 
criterion  for  success  is  duplicating  normal  human  perfor¬ 
mance,  finding  salient  points  on  image  curves  (potential 
partition  points)  plays  a  critical  role  in  both  two  and 
three  dimensional  object  recognition,  in  curve  approxi¬ 
mation,  in  tracking  moving  objects,  and  in  many  other 
tasks  in  machine  vision. 

In  many  approaches  to  2-D  object  recognition,  objects 
are  represented  by  their  boundaries,  and  the  recogni¬ 
tion  techniques  depend  (directly  or  indirectly)  on  locat¬ 
ing  distinguished  points  along  the  boundary;  typically 
these  distinguished  points  are  discontinuities  or  extrema 
of  local  curvature  (sometimes  called  “corner  points”)  and 
inflection  points  [e.g.,  Mokhtarian86].  “Corners”  on  the 
contours  of  imaged  objects  are  often  used  as  features  for 
tracking  the  motion  of  these  objects  and  for  comput¬ 
ing  optical  flow  [e.g.  Mehrotra90].  In  3-D  recognition, 
partitioning  is  typically  one  of  the  first  analysis  steps  - 
especially  when  objects  can  occlude  each  other.  Hoffinan 
and  Richards  [Hoffinsn82]  argue  that  when  3-D  parts  are 
joined  to  create  complex  objects,  concavities  will  gener¬ 
ally  be  observed  in  their  silhouettes,  and  that  segmen¬ 
tation  of  image  contours  at  concavities  (  the  maxims  of 
negative  curvature  along  the  contours)  is  a  good  strat¬ 
egy  to  decompose  (even  unmodeled)  objects  into  their 
“natural  parts.” 

In  cartography,  computer  graphics,  and  scene  anaysis, 
it  is  often  desirable  to  partition  an  extended  boundary 
or  a  contour  into  a  sequence  of  simply  represented  prim¬ 
itives  (e.g.,  straight  line  segments  or  pol3momial  curves 
of  some  higher  degree)  to  simplify  subsequent  analysis 
and  to  minimize  storage  requirements  [e.g.,  Teh89]. 

In  our  own  current  work  concerned  with  delineating 
linear  structures  in  aerial  images,  the  technique  pre¬ 
sented  in  this  paper  was  an  essential  component  of  the 
system  (briefly  described  in  Appendix  C)  that  produced 
the  results  displayed  in  Figure  6. 


2.  Problem  Statement 

In  its  most  general  sense,  partitioning  involves  assign¬ 
ing,  to  every  element  of  a  given  "object”  set,  a  label 
brnn  a  given  ”laber  set.  For  our  purposes  in  this  pa¬ 
per,  the  object  set  is  the  set  of  points  along  ^  curve  (or 
contour  segment)  lying  in  a  prescribed  region  of  a  two- 
dimensional  plane.  While  we  deal  with  cases  where  the 
points  in  the  object  set  do  not  form  a  continuous  dig¬ 
ital  curve,  in  most  of  our  exposition  in  this  paper  we 
will  assume  that  the  curves  are  continuous  *  and  non- 
inteisecting.  Our  label  set  is  binary,  points  will  be  called 
either  significant  (critical)  or  non-significant,  for  some 
specified  purpose.  In  Fischler  and  Belles  [FischlerSfi], 
it  is  demonstrated  (or  at  least  argued)  that  perceptual 
partitioning  is  net  independent  of  some  assumed  task 
or  purpose.  In  this  paper  we  focus  on  one  of  the  three 
tasks  discussed  in  the  above  reference:  Selecting  a  small 
number  of  points  (called  erilpts)  along  a  curve  segment 
which  could  be  used  as  the  basis  for  recotutrueiinj  the 
curve  at  some  future  time.  Figure  1  shows  the  specific 
instructions  and  curves  used  in  one  set  of  relevant  exper¬ 
iments  involving  human  subjects;  this  figure  also  shows 
the  critpts  that  were  selected  by  the  subjects,  and  the 
comparable  results  produced  by  our  algorithm  (called 
the  Saliency  Selection  System,  or  SSS,  and  discuoed  in 
Appendix  B). 

In  order  to  separate  the  generic  partitioning  criteria 
used  by  human  subjects  from  criteria  based  on  their 
past  experience,  such  as  when  the  subject  is  able  to  as¬ 
sign  a  name  to  the  curve  (e.g.,  the  curve  looks  like  the 
letter  ”s”),  we  used  "random”  curve  segments  for  our 
experiments;  the  technique  employed  to  generate  the 
segments  is  described  in  Appendix  A.  We  also  wanted 
to  avoid  having  to  deal  with  the  recognition  of  global 
features  (e.g.,  symmetry  or  repeated  structure,  or  even 
straight  lines  and  analytic  curves)  as  a  condition  for  mak¬ 
ing  critpt  selections;  avoiding  this  problem  is  justified  if 
we  are  correct  in  our  belief  that  local  and  global  anal¬ 
ysis  are  accomplished  by  separate  mechanisms.  In  or¬ 
der  to  deal  with  global  features,  the  complexity  of  any 
solution  would  be  expanded  enormously  since  a  whole 
new  vocabulary  of  such  features  and  their  representar- 
tions  would  have  to  be  implemented.  The  generation 
and  use  of  random  curves  took  care  of  this  problem  also 
(i.e.,  it  is  highly  unlikely  that  symmetries  or  repeated 
structure  would  ever  be  generated  by  our  random  pro¬ 
cess). 

3.  Relevance,  Prior  Work,  and  Critical 
Issues 

The  partitioning  problem  has  been  a  subject  of  in¬ 
tense  investigation  since  the  earliest  work  began  in  mar 

^Eadi  peiat  of  Um  noo-^eiidihq;  on*  pued  wido  aura,  with 
coordiaatei  (xjr),  luo  mm  or  man  aoighbon  with  x-cooHinoti  ia 
tho  Mt  (x-fl,  X,  x-1),  aad  y-coerdiaatoo  ia  the  oat  (y-fl,  y.  y-1). 


chine  vision.  It  has  been  widely  assumed  that  in  order 
to  reduce  the  combinatorics  of  scene  analysis  to  a  man¬ 
ageable  level,  it  is  necessary  to  decompose  images  into 
their  meaningful  component  parts  as  one  of  the  fint  steps 
in  the  analysis  process.  The  difficulty  arises  from  the 
need  to  partition  the  image  into  parts  before  we  know 
the  identity  of  those  parts.  The  underlying  assumption 
then  is  that  there  are  generic  criteria,  independent  of  the 
goal  of  the  analysis,  that  if  discovered,  could  be  used  to 
obtain  useful  (or  at  least,  intuitively  acceptable)  parti¬ 
tioning;  additional  problem  dependent  criteria  could  be 
always  added  to  produce  a  more  relevant  result  for  some 
particular  purpose. 

The  partitioning  problem  becomes  progressively 
harder  as  we  increase  the  number  of  dimensions  in  which 
we  are  working;  in  this  paper  we  only  address  the  1.&-D 
problem  of  partitioning  planar  curves.  A  specific  crite¬ 
rion  which  can  form  the  basis  of  such  partitioning  was 
origioally  proposed  by  Attneave  [Attneave54]  -  points 
at  which  the  curve  bends  most  sharply  are  good  parti¬ 
tion  points.  ^  This  idea  has  been  the  starting  point  for 
most  of  the  subsequent  efforts  in  curve  partitioning,  but 
attempts  to  convert  this  abstract  concept  into  a  com¬ 
putationally  executable  procedure,  that  gives  intuitively 
acceptable  results,  has  meet  with  limited  success.  ^  Ref¬ 
erences  [ImaiSfi,  MokhtarianSfi,  Pavlidis74,  Rosenfeld73, 
Teb89,  Wuescher91]  are  representative  of  work  in  this 
area.  * 

The  main  problems  we  must  solve  are: 

(a)  A  way  of  assigning  a  measure  (or  degree)  of 
saliency’/eritiesdity  *  to  each  point  on  a  curve. 
Most  investigators  have  equated  sharp  bending  of 
a  curve  with  the  mathematical  concept  of  curva¬ 
ture,  but  curvature  is  not  well-defined  for  a  finite 
sequence  of  points  (which  is  how  our  sensor  ac¬ 
quired  curves  are  generally  represented).  Further, 
it  is  not  obvious  that  the  mathematic^  definition 
of  curvature  is  the  best  computational  approximar 
tion  to  the  human  criteria  for  criticality.  In  Fis¬ 
chler  and  Belles  [FischlerSfi],  bending  is  interpreted 

^Hoibun  and  Ridiarda  [HoAnaiiSS]  giva  convinciiia  evidence 
that  we  should  diatinguiah  between  poeitive  and  negative  curvap 
turc  maxima.  That  is,  on  cloaed  corves,  extreme  points  of  nega¬ 
tive  curvature  -  associated  with  object  concavities  -  have  greater 
utility  as  partition  points  than  positive  curvature  maxima,  but  the 
positive  maxima  (and  inflection  points)  play  an  important  role  ia 
dssQtabing  the  individual  segments. 

*Am  noted  later,  moat  of  tha  woric  on  the  curve  partitioning 
problem,  cspeciaUy  recent  work,  has  set  been  concerned  with  du- 
pUcating  genaric  human  performance,  but  rather  with  perfonning 
specific  visual  tasks  having  diSerent  criteria  for  success. 

*Tha  approach  taken  by  i^^ieacher  and  Boyer  is  dbtinct  ia  that 
they  first  extract  contour  segments  of  approrimatdy  constant  cur¬ 
vature  and  then  infer  the  location  of  partition  points  as  a  secondary 
operation. 

*We  will  use  the  terms  saiicney  and  criiiesiity  somewhat  inter¬ 
changeably  in  this  paper.  However,  saliency  can  be  considered  to 
be  the  generic  suhset  of  points  that  are  critical  for  tome  partition¬ 
ing  task. 


u  <Uvi»tion  from  atrughtnen  -  it  ia  cloaeiy  related 
to  proiMaad  appraximatioiia  to  mathematical  cur¬ 
vature,  aa  illuatratcd  in  Figurea  2  and  3,  but  haa 
a  number  of  advantages:  it  ia  an  eaaily  meaaured 
quantity,  even  for  digital  curvea  (i.e.,  aequencea  of 
coordinate  paira),  and  aa  diacuaaed  in  the  next  sec¬ 
tion,  its  load  extrema  are  ia  better  accord  with  hu¬ 
man  pteferoice  (choicea  baaed  on  approximations 
to  the  definition  of  mathematical  curvature  occar 
sionally  include  anomalous  points  as  shown  in  the 
examples  of  Figures  2  and  3). 

(b)  A  way  of  adjusting  the  criticality  of  a  given  curve- 
point  to  take  into  account  its  interactions  with  its 
neighbors;  i.e.,  locsd  context.  It  is  obvious  that 
human  subjects  will  often  avoid  assigning  a  critpt 
label  to  both  members  of  a  pair  of  points,  even  when 
both  points  have  high  (independent)  criticality  val¬ 
ues,  if  the  points  ate  close  neighbors  along  the  curve. 
The  basic  approach  of  local  non-maximum  suppres¬ 
sion  is  not  sufficient,  in  itself,  to  duplicate  human 
performance. 

(c)  A  way  of  dealing  with  the  interactions  between 
critpts  that  are  significant  at  different  <;c;  les  of 
resolution.  If  a  human  subject  looks  through  a 
fixed  sited  window  at  the  same  curve  segment  dis¬ 
played  at  two  different  magnifications,  the  selected 
critpts  will  not  always  be  the  same,  and  the  selection 
at  the  lower  resolution  will  not  always  be  a  subset  of 
those  at  the  higher  resolution  (e.g..  Figure  4).  This 
is  in  contrast  to  the  commonly  held  assumption  that 
critpt  assignment  should  be  independent  of ’’scale  of 
resolution.” 

(d)  A  threshold  of  significance;  a  minimal  level  of 
criticality  below  which  variations  are  considered  to 
be  noise  and  no  critpt  designations  are  made.  (Some 
investigators  reject  the  ides  that  any  user  supplied 
parameters  or  thresholds  should  be  necessary.) 

We  have  addressed  the  above  issues  through  the  solu¬ 
tions  to  a  set  of  subproblems: 

1.  Definition  of  an  algorithmic  procedure  (which  is  pa¬ 
rameterized  to  deal  with  noise  and  scale)  for  assign¬ 
ing  criticality  values  to  each  point  on  a  curve  in¬ 
dependent  of  decisions  made  about  the  locations  of 
(other)  critpts.  The  solution  to  this  problem,  es¬ 
sentially  the  procedure  given  in  Fischler  and  BoUes 
[FischlerSfi],  provides  answers  at  a  human  level  of 
performance  for  isolated  critpts  (i.e.,  along  a  sec¬ 
tion  of  a  random  curve,  generated  as  described  in 
Appendix  A,  for  which  human  subjects  select  only 
one  critpt).  Thus,  for  the  domains  we  experimented 
with  (and  especially  the  domain  defined  in  Ap¬ 
pendix  A),  we  were  able  to  assign  fixed  values  to 
scale/resolution  and  noise/significance  parameters 


so  that  our  program  would  make  the  same  selections 
as  human  subjects  when  there  was  neu  unanimous 
agreement  among  these  subjects.  This  algorithm  is 
described  in  Appendix  B. 

2.  An  analysis  of  how  geometric  scaling  of  the  in¬ 
put  curve,  and  resolution  specific  operations  on  the 
curve,  can  be  equated,  and  thus  the  development  of 
a  basis  for  normalizing  criticality  scores  across  scale. 

3.  Development  of  a  general  approach  to  the  problem 
of  resolving  the  competition/cooperation  interac¬ 
tions  of  geometrically  related  objects  baaed  on  "lo¬ 
cal  dominance.”  The  same  machinery  used  to  deal 
with  interactions  at  a  given  scale  of  resolution  is 
also  used  to  resolve  confiicts  across  different  scales 
of  resolution. 

In  the  remainder  of  this  paper,  we  describe  our  so¬ 
lutions  to  the  problems  enumerated  above,  and  then 
present  examples  and  experimental  results  to  justify  the 
design  decisions  we  made  and  to  illustrate  the  perfor¬ 
mance  capabilities  of  our  algorithm. 

4.  Evaluation  of  Saliency 

Saliency  is  a  critical  attribute  (for  description  and 
recognition)  assigned  to  perceived  things  in  the  world 
by  the  human  visual  system  (HVS).  While  an  elusive 
concept  in  general,  task  specific  specializations  of  this 
concept  are  easily  found  that  elicit  consistent  choices 
across  human  subjects.  An  acceptable  computational 
definition  of  contour/curve  saliency  must  provide  * 

•  The  specification  of  a  procedure  that  quantifies  the 
abruptness  and  extent  of  the  deviation  of  a  curve 
from  its  straight-line  continuation;  a  sharp  bend  is 
more  salient  than  a  shallow  one,  and  the  greater 
the  excursion,  the  more  prominent/salient  the  "fea¬ 
ture.” 

•  Agreement  with  human  judgement  in  terms  of  both 
selection,  and  accuracy  of  placement,  of  the  critical 
points  (in  some  well  defined  context). 

4.1  A  Computational  Definition  of 
Saliency 

Conventional  definitions  of  curvatiue  present  a  num¬ 
ber  of  serious  problems  with  respect  to  their  use  as  a 
saliency  measure  in  computational  vision  (CV).  First, 
the  mathematical  definition  is  based  on  the  properties 
of  a  curve  in  the  infinitesimal  neighborhood  about  the 

*Ia  this  paper  we  are  primarily  concerned  with  aaliency  baaed 
on  loetl  cues;  locations  on  a  curve  where  there  is  a  transition 
from  one  type  of  curvature  behavior  to  another,  e.g.  from  per¬ 
fectly  straight  to  "wiggley,”  may  also  be  psychologically  salient, 
but  such  forms  of  fhM  sa/teaey  are  beyond  the  scope  of  our  cur¬ 
rent  investigation. 


u  deviatioB  from  straightness  -  it  is  closely  related 
to  proposed  approximations  to  mathematical  cur¬ 
vature,  as  illustrated  in  Figures  2  and  3,  but  has 
a  number  of  advantages:  it  is  an  easily  measured 
quantity,  even  for  digital  curves  (i.e.,  sequences  of 
coordinate  pairs),  and  as  discussed  in  the  next  sec¬ 
tion,  its  local  extrema  ue  in  better  accord  with  hu¬ 
man  preference  (choices  based  on  approximations 
to  the  d^nition  of  mathematical  curvature  occar 
sionally  include  anomalous  points  as  shown  in  the 
examples  of  Figures  2  and  3). 

(b)  A  way  of  adjusting  the  criticality  of  a  given  curve- 
point  to  take  into  account  its  interactions  with  its 
neighbors:  i-e-,  local  context.  It  is  obvious  that 
human  subjects  will  often  avoid  assigning  a  critpt 
label  to  both  members  of  a  pair  of  points,  even  when 
both  points  have  high  (independent)  criticality  val¬ 
ues,  if  the  points  are  close  neighbors  along  the  curve. 
The  basic  approach  of  local  non-maximum  suppres¬ 
sion  is  not  sufficient,  in  itself,  to  duplicate  human 
performance. 

(c)  A  way  of  dealing  with  the  interactions  between 
critpts  that  are  significant  at  different  scales  of 
resolution.  If  a  human  subject  looks  through  a 
fixed  sited  window  at  the  same  curve  segment  dis¬ 
played  at  two  different  magnifications,  the  selected 
critpts  will  not  always  be  the  same,  and  the  selection 
at  the  lower  resolution  will  not  always  be  a  subset  of 
those  at  the  higher  resolution  (e.g.,  Figure  4).  This 
is  in  contrast  to  the  commonly  held  assumption  that 
critpt  assignment  should  be  independent  of  "scale  of 
resolution.” 

(d)  A  threshold  of  significance;  a  minimal  level  of 
criticality  below  which  variations  are  considered  to 
be  noise  and  no  critpt  designations  are  made.  (Some 
investigators  reject  the  idea  that  any  user  supplied 
parameters  or  thresholds  should  be  necessary.) 

We  have  addressed  the  above  issues  through  the  solu¬ 
tions  to  a  set  of  subproblems: 

1.  Definition  of  an  algorithmic  procedure  (which  is  pa¬ 
rameterized  to  deal  with  noise  and  scale)  for  assign¬ 
ing  criticality  values  to  each  point  on  a  curve  in¬ 
dependent  of  decisions  made  about  the  locations  of 
(other)  critpts.  The  solution  to  this  problem,  es¬ 
sentially  the  procedure  given  in  Fischler  and  BoUes 
[FischlerSfi],  provides  answers  at  a  human  level  of 
performance  for  isolated  critpts  (i.e.,  along  a  sec¬ 
tion  of  a  random  curve,  generated  as  described  in 
Appendix  A,  for  which  human  subjects  select  only 
one  critpt).  Thus,  for  the  domains  we  experimented 
with  (and  especially  the  domain  defined  in  Ap¬ 
pendix  A),  we  were  able  to  assign  fixed  values  to 
seale/resolution  and  notse/significance  parameters 


so  that  our  program  would  make  the  same  selections 
as  human  subjects  when  there  was  near  unanimous 
agreement  among  these  subjects.  This  algorithm  is 
described  in  Appendix  B. 

2.  An  analysis  of  how  geometric  scaling  of  the  in¬ 
put  curve,  and  resolution  specific  operations  on  the 
curve,  can  be  equated,  and  thus  the  development  of 
a  basis  for  normalizing  criticality  scores  across  scale. 

3.  Development  of  a  general  approach  to  the  problem 
of  resolving  the  competition/cooperation  interac¬ 
tions  of  geometrically  related  objects  based  on  "lo¬ 
cal  dominance.”  The  same  machinery  used  to  deal 
with  interactions  at  a  given  scale  of  resolution  is 
also  used  to  resolve  conflicts  across  different  scales 
of  resolution. 

In  the  remainder  of  this  paper,  we  describe  our  so¬ 
lutions  to  the  problems  enumerated  above,  and  then 
present  examples  and  experimental  results  to  justify  the 
design  decisions  we  made  and  to  illustrate  the  perfor¬ 
mance  capabilities  of  our  algorithm. 

4.  Evaluation  of  Saliency 

Saliency  is  a  critical  attribute  (for  description  and 
recognition)  assigned  to  perceived  things  in  the  world 
by  the  human  visual  system  (HVS).  While  an  elusive 
concept  in  general,  task  specific  specializations  of  this 
concept  are  easily  found  that  elicit  consistent  choices 
across  human  subjects.  An  acceptable  computational 
definition  of  contour/curve  saliency  must  provide  * 

•  The  specification  of  a  procedure  that  quantifies  the 
abruptness  and  extent  of  the  deviation  of  a  curve 
from  its  straight-line  continuation;  a  sharp  bend  is 
more  salient  than  a  shallow  one,  and  the  greater 
the  excursion,  the  more  prominent/salient  the  "fea¬ 
ture.” 

•  Agreement  with  human  judgement  in  terms  of  both 
selection,  and  accuracy  of  placement,  of  the  critical 
points  (in  some  well  defined  context). 

4.1  A  Computational  Definition  of 
Saliency 

Conventional  definitions  of  curvature  present  a  num¬ 
ber  of  serious  problems  with  respect  to  their  use  as  a 
saliency  measure  in  computational  vision  (CV).  First, 
the  mathematical  definition  is  based  on  the  properties 
of  a  curve  in  the  infinitesimal  neighborhood  about  the 

this  paper  we  are  primarily  concerned  with  ealiency  baeed 
on  lees!  cuea;  locationa  on  a  curve  where  there  is  a  transition 
horn  one  type  of  curvature  behavior  to  another,  e.g.  horn  per¬ 
fectly  straight  to  ”wiggiey,”  may  also  be  psychologically  salient, 
but  such  foims  of  yleia/  sslieaey  are  beyond  the  scope  of  our  cur¬ 
rent  investigation. 


poiot  at  which  curvature  is  being  measured.  For  the  fi¬ 
nite  precision  quantised  curves  dealt  with  in  CV,  it  has 
been  difficult  to  find  a  suitable  approximation  to  the 
limiting  process  originally  intended  for  use  on  maihe- 
maiieally  conttauovs  curves.  Second,  it  is  readily  ob¬ 
served  that  saUency  is  not  an  infinitesimal  point  prop¬ 
erty,  but  is  based  on  some  finite  extent  of  the  curve.  A 
proposed  solution  to  both  problems,  offered  by  Rosenfeld 
and  Johnston  [RosenfeldTS]  was  to  find  an  appropriately 
sised  segment  of  the  curve  about  the  point  in  question, 
and  take  a  "snapshot”  of  the  limiting  process  at  this 
single  (implied)  scale.  That  is,  rather  than  the  rate  of 
change  of  tangent  angle  with  respect  to  curve  length, 
R/J  proposed  measuring  the  angle  between  two  fixed 
length  chords,  where  the  lengths  correspond  to  the  com¬ 
puted  "natural  scale”  of  the  curve  about  the  given  point. 
We  will  call  this  curvature-analog  the  R/J-Curvature. 
There  are  a  number  of  other  definitions  of  mathemati¬ 
cal  curvature  (e.g.,  the  limiting  radius  of  a  circle  whose 
three  defining  points  converge  at  the  curve-point  in  ques¬ 
tion)  which  have  analogs  that  could  have  been  used  in 
place  of  the  angle  measure  in  R/J-Curvature  but  these 
definitions  are  monotonically  related,  and  do  not  really 
present  distinct  alternatives.  Thus,  R/J-Curvature  is  a 
suitable  representative  for  the  whole  class  of  mathemat¬ 
ical  curvature-measure  analogs. 

In  Fischler  and  BoUes  [FischlerSfi],  our  concern  was 
not  to  find  a  good  digital  analog  for  curvature,  but  rather 
to  find  an  effective  measure  of  saliency.  The  quantity 
defined  in  that  paper  can  be  viewed  as  a  curvature- 
extremum  measure  in  which  the  limiting  process  (in 
scale)  is  replaced  by  a  scanning  process  (in  space)  more 
appropriate  to  digital  curves.  The  scanning  process  is 
paiameterised  by  scale,  and  the  resulting  measure  is  a 
signed  quantity  which  we  call  F/B-Saliency  (F/B-S). 

While  the  particular  choice  of  a  curvature  measure 
as  a  component  in  a  complete  system  for  selecting  the 
most  salient  points  (critpts)  on  a  planar  curve  depends 
on  many  factors,  it  is  still  interesting  to  compare  the  raw 
scores  returned  by  curvature-analogs  represented  by  the 
R/J-Curvature  with  the  extreme  points  (ultimately)  se¬ 
lected  by  our  algorithm  (SSS)  as  shown  in  Figures  2  and 
3  for  a  randomly  generated  curve.  In  these  figures  we 
observe  problem  situations  that  highlight  some  of  the 
differences  between  the  two  underlying  metrics  (R/J- 
Curvature  and  F/B-Saliency).  ^ 

There  are  some  problems  with  any  raw  measure  of  cur¬ 
vature  that  must  be  dealt  with  by  using  procedures  that 

^In  both  of  Uw  fi(ares,  w«  mod  fixed  common  scale  parame- 
ten  for  both  metrics  ae  noted  in  the  figure  captions.  It  should  be 
rsmembegred  that  R/J-cnrvature,  as  we  define  it  in  this  paper,  is 
representative  of  a  whole  daaa  of  curvaturw-based  metrics  and  is 
not  intended  to  duplicate  the  complete  Rosenfeld/ Johnston  algo¬ 
rithm  -  they  also  incorporste  a  procedure  for  finding  a  prefetred 
stidc  length.  However,  many  of  the  probicma  with  the  performance 
of  the  complete  algorithm,  whidt  are  discussed  in  DavisTT  and  in 
other  of  the  papers  we  reference,  can  be  observed  in  the  perfoiv 
manre  of  the  R/J-Curvatuie  metric. 


invoke  (at  least)  local  context.  For  example,  in  Figure  3 
we  see  a  case  (double  arrow)  where  two  critpts  were  se¬ 
lected  at  almost  adjacent  locations  along  the  curve.  This 
undesirable  behavior  was  not  eliminated  by  the  simple 
"non-maximum  suppression”  filter  that  produced  good 
results  in  most  other  situations.  It  is  necessary  to  use 
more  specific  criteria  in  deciding  when  two  critpts  are 
too  close  together,  and  also,  what  to  do  when  the  ad¬ 
jacent  points  have  equal  saliency  scores  (e.g.,  arbitrar¬ 
ily  eliminate  one  of  them  or  eliminate  both  and  place 
a  new  critpt  between  them).  In  Figure  3  we  see  cases 
(two  single  arrows)  where  almost  invisible  features  were 
chosen  as  critpts  because  they  did  have  locally  extreme 
curvature  scores;  how  do  we  decide  when  to  reject  such 
occurances.  In  Figure  2  we  see  a  case  where  a  critpt  (des¬ 
ignated  by  an  arrow)  was  inserted  at  a  location  displaced 
from  the  position  we  consider  correct;  this  was  due,  in 
part,  to  the  length  of  the  arms  of  the  angle  measuring 
"operator"  relative  to  the  size  of  the  feature  (see  Figure 
2d)  -  it  is  not  always  possible  (or  practical)  to  find  an 
appropriate  operator  size  for  every  potential  feature.  In 
the  following  sections  (and  appendices)  of  this  paper  we 
describe  and  justify  the  methods  we  employ  to  deal  with 
these  problems.  The  issue  we  are  primarily  concerned 
with  in  this  section  is  the  choice  of  a  basic  saliency  met¬ 
ric.  We  justify  our  preference  for  the  F/B-S  metric  on 
two  grounds: 

1.  Unlike  the  fixed  scale  mathematical  (FSM)  curva¬ 
ture  analogs  (e.g.,  R/J-curvature),  F/B-S  rarely 
makes  an  error  in  positioning  a  critpt,  or  in  ignoring 
a  salient  point  that  human  observers  would  select. 
The  issue  here  is  robustness,  F/B-S  integrates  infor¬ 
mation  over  an  extended  set  of  "looks”  at  the  curve 
segment  containing  the  point  whose  saliency  is  be¬ 
ing  measured.  FSM  techniques  take  a  single  look 
at  the  situation.  Thus,  our  main  problem  with  the 
F/B-S  metric  is  selecting  the  most  salient  of  the  se¬ 
lected  critpts  to  be  retained  as  our  final  result  (the 
filtering  operation  generally  involves  the  elimination 
of  less  than  half  of  the  points  originally  selected). 

2.  The  F/B-S  metric  is  responsive  to  both  the  curvar 
ture  and  the  size  of  a  curve  "feature."  This  pro¬ 
vides  a  common  basis  for  ranking  critpts  at  a  given 
scale  (so  that  the  larger  of  two  geometrically  sim¬ 
ilar  objects  is  assigned  a  higher  saliency  score)  as 
well  as  across  scales  by  taking  into  account  the  size 
of  the  operator.  The  FSM-curvature  analogs  are 
insensitive  to  the  size  of  the  feature  -  they  inherit 
the  mathematical  property  that  curvature  is  a  point 
property  and  only  the  smallest  neighborhood  about 
a  point  that  allows  us  to  measure  curvature  is  rel¬ 
evant  (this  implies  a  single  "natural  scale”  at  any 
point  on  a  curve;  a  concept  we  reject,  e.g.,  see  Fig¬ 
ure  4). 


4.2  Comparison  of  the  Saliency  Selection 
System  (SSS)  with  Human  Performance 

The  ptimaty  criterion  for  judging  the  competence  of 
the  overall  saliency  selection  system  (SSS)  we  present 
in  this  paper  is  its  ability  to  match  human  performance 
-  both  in  the  defined  task  and  with  respect  to  generic 
evaluation  of  the  selected  critpts.  We  performed  a  set  of 
informal  experiments  with  11  human  subjects  (also  see 
the  experiments  described  in  Fischler86).  The  instruc¬ 
tions  given  to  the  subjects  and  the  resulting  selections 
are  shown  in  Figure  1.  We  also  show  the  selections  made 
by  the  SSS  algorithm.  The  results  of  these  (and  addi¬ 
tional  but  not  described)  experiments  can  be  summa¬ 
rized  as  follows 

•  At  least  9  of  the  11  subjects  selected  the  same  set 
of  six  or  more  critpts  on  each  of  the  four  curves  we 
used  in  the  experiments,  and  the  SSS  chose  the  same 
set  of  critpts.  Every  critpt  selected  by  the  SSS  was 
also  selected  by  at  least  one  human  subject. 

•  In  spite  of  the  high  degree  of  consistency  in  the 
overall  selection  of  salient  points,  the  human  sub¬ 
jects  differed  in  the  order  in  which  they  chose  these 
points.  We  tried  a  number  of  experiments  in  which 
the  only  difference  was  a  very  slight  change  in  the 
wording  of  the  instructions,  and  obtained  different 
orderings  (across  the  same  set  of  selected  points) 
from  our  subjects.  It  is  obvious  that  the  subjects 
used  a  global  strategy  to  match  the  task  (differ¬ 
ent  for  each  subject)  to  choose  the  order  in  which 
the  points  were  selected  -  even  though  the  specific 
points  selected  were  largely  determined  by  local  con¬ 
text. 

In  addition  to  the  cwves  used  in  the  human  experi¬ 
ments,  we  ran  the  SSS  algorithm  on  (the  order  of)  1000 
randomly  generated  curves  with  no  obvious  errors.  Fig¬ 
ure  5  shows  the  results  of  a  (typical)  sequence  of  40  con¬ 
secutive  experiments. 

5.  Dealinj^  with  the  Problems  of  Scale 
and  Resolution 

A  vision  system,  concerned  with  creating  a  descrip¬ 
tion  of  some  object  that  may  be  encountered  again  in 
the  future,  perhaps  when  the  object  is  closer  or  further 
away,  must  take  scale  or  magnification  into  account  when 
deciding  what  shape  elements  to  pay  attention  to.  Un¬ 
der  extreme  changes  in  resolution,  when  salient  features 
might  appear  or  disappear,  it  may  not  be  possible  to 
make  an  informed  judgement  in  the  assignment  of  reia^ 
tive  saliency  scores;  but  for  a  limited  range  about  a  given 
resolution,  this  should  indeed  be  possible. 

Obviously,  geometric  properties  of  objects  that  are  in¬ 
variant  over  scale  are  especially  valuable  in  describing 
and  recognizing  the  objects,  since  absolute  scale  is  of¬ 
ten  impossible  to  judge  in  an  image,  and  even  relative 


scale  can  be  difficult  to  describe  or  measure  if  the  mea¬ 
surement  must  be  referenced  to  the  global  geometry  of 
the  object.  One  of  the  main  issues  we  address  in  this 
paper  is  how  to  define  extrema  in  the  "bending”  of  a 
curve  as  a  local  effectively  scale-invariant  property  that 
is  in  agreement  with  the  judgement  of  the  human  visual 
system. 

If  we  define  criticality  of  points  on  a  digitally  rep¬ 
resented  curve  in  terms  of  quantities  that  have  dimen¬ 
sions  that  must  be  measured  by  some  physical  process, 
then  there  is  no  direct  way  of  invoking  such  formally 
defined  mathematical  concepts  as  the  derivative,  or  cur¬ 
vature,  which  require  limiting  processes  of  infinite  reso¬ 
lution.  Approximations  to  these  concepts  are  resolution 
dependent  (e.g.,  the  size  of  the  operator  employed)  and 
measurements  made  on  most  objects  will  not  "scale”  in 
any  simple  or  uniform  way.  Further,  if  we  examine  a 
curve  through  a  fixed  size  window  (either  a  fixed  region 
of  a  computer  screen,  or  the  foveal  region  of  the  hu¬ 
man  retina),  and  we  successively  increase  the  resolution 
at  which  the  curve  is  displayed,  some  of  its  parts  will 
eventually  disappear  from  view,  and  some  of  the  smaller 
original  structures,  that  were  not  significant,  will  now 
dominate  the  visible  appearance  of  the  curve  (e.g.,  Fig¬ 
ure  4). 

If  the  mathematical  definition  of  curvature  were  ap¬ 
plicable  to  digital  imagery,  then  many  (but  not  all)  of 
the  issues  of  scale  could  be  resolved.  There  is  still  the 
problem  that  a  very  small  "glitch”  can  have  a  very  high 
value  of  curvature  but  a  very  low  psychological  signifi¬ 
cance.  Thus  the  scale  or  size  of  a  "feature”  (e.g.,  the 
glitch)  is  an  issue.  The  term  "feature”  does  not  appear 
in  our  problem  definition;  in  fact,  by  focusing  on  local 
curve  properties,  we  bad  hoped  to  eliminate  the  need  to 
invoke  this  concept  since  an  appropriate  definition  is  far 
&om  obvious.  *  Since  scale  can’t  be  ignored  (even  if 
we  had  a  good  approximation  for  curvature  in  the  digi¬ 
tal  domain  that  was  independent  of  scale)  the  following 
questions  arise: 

•  The  distinction,  if  any,  between  resolution  and  scrde 

•  How  to  choose  a  range  of  scales  appropriate  to  the 
specified  performance  criteria 

•  How  to  measure  criticality  at  different  scales 

•  How  to  compare  criticality  values  computed  at  dif¬ 
ferent  scales 

•  The  relationship  between  smoothing  and  scale 
change 

*lnkuitivel)r,  there  are  aections  of  any  (iven  curve  that  we  call 
feature*;  these  entities  provide  the  psydiological  basis  for  the  se¬ 
lection  and  relative  saliency  of  the  associated  critpts.  Critpts  are 
markers  that  define  the  sh^e  and  boundary  of  feature*  -  the  ex¬ 
tent  of  the  curve  cotrespondina  to  a  feature  will  generally  sub¬ 
sume  the  "region  of  support"  for  the  curvepoint*  comprising  the 
feature.  Features  can  overlap,  and  their  boundaries  are  not  always 
apparent. 


•  The  relation  between  operator  size  and  scale  change 

•  How  to  make  cooperation/competition  judgements 
across  scales 

•  How  to  determine  the  features  for  which  we  ex¬ 
pect  consistency  (of  criticality  scores)  to  hold  across 
scales,  and  where  such  consistency  can’t  be  expected 
(if  the  latter  were  never  the  case,  we  could  always  do 
our  analysis  at  one  scale  and  compute  the  criticality 
values  at  other  scales  as  needed). 

While  consistency  at  all  scales  and  for  all  features  is 
not  possible,  over  some  range  of  scales  (say  5:1)  we  ex¬ 
pect  there  to  be  a  "normalization”  factor  which  allows 
us  to  compare  the  saliency  scores  computed  at  one  scale 
with  values  computed  at  other  scales.  We  would  also 
expect  that  relative  locations  of  local  extrema  for  cer¬ 
tain  features  would  remain  fixed  as  a  curve  is  scaled, 
regardless  of  the  size/scale  of  the  operator  that  assigns 
the  criticality  scores. 

Some  of  the  earliest  work  (e.g.,  RxMenfeld  and  John¬ 
ston)  on  finding  salient  points  merged  the  problem  of 
assigning  a  curvature  measure  to  a  point  with  that  of  de¬ 
termining  the  scale  at  which  to  measure  curvature.  The 
key  idea  is  that  each  point  has  a  single  scale  at  which 
its  curvature  should  be  measured  -  this  scale  is  usually 
found  by  a  search  process  over  successively  larger  scales 
until  some  measured  quantity  achieves  a  local  extremum. 

*5.1  Change  of  Scale  Vs.  Change  of  Res¬ 
olution 

If  we  magnify  a  continuous  curve  that  was  originally 
represented  at  infinite  precision,  every  point  of  the  new 
image  corresponds  to  a  point  in  the  original  image,  but 
its  X  and  y  coordinate  values  have  been  multiplied  by 
some  real  number  which  we  will  call  the  scale  factor. 
No  information  was  introduced  nor  lost,  but  the  phys¬ 
ical  space  required  to  render  the  curve  has  increased. 
However,  if  the  original  curve  was  represented  at  finite 
resolution  (e.g.,  each  point  as  a  pair  of  integer  coordi¬ 
nates),  then  (say)  doubling  the  scale  leaves  us  with  a 
disconnected  set  of  points.  Filling  in  the  gaps  requires 
introducing  new  information.  Here  we  will  say  that  a 
change  of  resolution  has  occurred  (a  change  in  resolu¬ 
tion  can  also  result  in  the  loss  of  information,  as  in  the 
case  of  demagnification  or  smoothing  at  some  fixed  reso¬ 
lution).  Thus,  the  concept  of  a  scale  change  corresponds 
to  a  reversible  transformation,  while,  in  general,  a  change 
in  resolution  involves  an  irreversible  process  in  which  in¬ 
formation  is  lost  (as  in  smoothing),  or  new  information 
is  introduced  (as  can  occur  in  zooming). 

If  we  compute  the  curvature  for  points  on  a  continuous 
(infinite  resolution)  curve  at  two  different  scales,  we  wiU 
generally  get  two  distinct  sets  of  values  (e.g.,  a  circle 
with  radius  2  is  a  scaled  version  of  a  circle  with  radius 
1,  but  by  definition,  their  curvatures  are  in  the  ratio 
1:2.  On  the  other  hand,  the  angles  of  a  triangle  remain 


unaltered  under  a  scale  change).  It  will  be  the  case, 
however,  that  for  smooth  curves,  the  local  extrema  will 
be  found  at  corresponding  locations  -  but  even  here,  the 
numerical  values  of  curvature  will  not  scale  in  any  simple 
way  (curvature  is  a  nonlinear  function). 

5.2  SSS  Mechanisms  foi*  Evaluating 
Saliency  at  Different  Scales  and  Resolu¬ 
tions 

In  designing  a  computational  module  to  evaluate 
saliency  subject  to  the  ideas  discussed  above,  we  can 
pursue  at  least  three  distinct  strategies: 

1.  Assume  that  saliency  is  independent  of  scale,  or  that 
there  is  a  natural  scale  associated  with  each  location 
on  the  curve  that  must  be  discovered. 

2.  Use  a  fixed  scale  saliency  measure,  but  generate 
multiple  versions  of  the  given  curve  at  some  pre¬ 
determined  set  of  scales. 

3.  Parameterize  the  saliency  measure  to  give  results 
approximating  those  that  would  be  obtamed  from 
strategy  (2)  for  the  selected  scales. 

We  previously  argued  against  strategy  (1)  on  the  as¬ 
sumption  that  a  unique  natural  scale  csnaof  generally 
be  associated  with  a  single  curvepoint  (see  Figure  4). 
We  have  chosen  strategy  (3)  since  strategies  (2)  and  (3) 
are  conceptually  compatible,  but  (3)  could  be  compu¬ 
tationally  more  efficient  if  we  can  find  a  simple  way  to 
use  some  combination  of  operator  scaling  and  score  nor¬ 
malization  so  that  both  approaches  give  (nominally)  the 
same  scores  in  most  situations.  Intuitively,  doubling  the 
stick  length  (in  the  F/B-S  metric)  for  a  simple  convex 
section  of  a  curve  should  result  in  four  times  the  score 
assigned  to  the  corresponding  critpt:  The  stick  is  now 
posit^ned  twice  the  distance  from  the  critpt  in  most  of 
its  "looks”  (i.e.,  placements  of  the  stick  which  subsume 
a  curve  segment  containing  the  critpt),  and  there  are 
twice  as  many  looks.  Thus,  the  procedure  we  employ, 
normalizing  all  scores  by  dividing  by  the  square  of  the 
sticklength,  will  leave  invariant  the  saliency  scores  as¬ 
signed  to  features  which  should  be  scale  invariant,  such 
as  the  angle  formed  by  two  (effectively)  infinite  straight 
lines.  On  the  other  hand,  for  those  features  that  have 
limited  extent  along  the  curve,  comparable  to  the  scales 
we  wish  to  discriminate  among,  the  larger  scaled  versions 
of  the  features  will  be  assigned  higher  scores. 

6.  Cooperation/Competition  Interac¬ 
tions  Between  Critical  Points 

An  important  contribution  of  this  paper  over  the  work 
presented  in  Fischler  and  Holies  [FischlerSfi]  is  a  major 
revision  of  the  approach  to  filtering  the  critpts,  based 
both  on  comparisons  at  a  given  scale  as  well  as  across 


different  scales.  At  a  conceptual  level,  there  are  two  main 
differences. 

First,  in  the  earlier  work  we  did  not  use  the  informa¬ 
tion  about  the  sign  (concavity/convexity)  of  the  com¬ 
puted  F/B-Saliency;  in  our  current  algorithm,  we  sepa¬ 
rate  all  the  candidate  critpts  into  two  sets  correspond¬ 
ing  to  positive  and  negative  F/B-S.  ’  These  two  sets 
are  processed  independently  of  each  other  (by  identical 
procedures)  and  the  resulting  selections  are  combined  by 
logical  union  to  produce  the  final  output.  Our  own  obser¬ 
vations  confirm  those  of  other  researchers  (e.g.,  Hoffman 
and  Richards),  that  positive  and  negative  curvature  ex¬ 
trema  appear  to  be  distinguished  from  each  other  by  the 
HVS,  in  part  because  they  play  different  roles  in  parti¬ 
tioning  and  description  tasks. 

Second,  in  the  earlier  work  we  used  a  simple  "domi¬ 
nance”  criterion  for  competition  of  closely  spaced  critpts 
detected  at  different  scales..  A  critpt  detected  at  some 
given  scale  would  suppress  all  critpts  detected  at  smaller 
scales  (shorter  "sticklength”)  that  were  located  within  a 
specified  scale  related  distance  from  it.  This  rule  rarely 
produced  "ugly*’  errors,  but  occasionally  caused  the  ob¬ 
viously  correct  critpt  to  be  deleted  in  favor  of  one  slightly 
displaced  from  the  preferred  location.  A  significant  por¬ 
tion  of  the  work  described  in  this  paper  has  been  fo¬ 
cused  on  finding  a  more  effective  and  uniform  basis  for 
establishing  "local  dominance.”  In  other  sections  of  this 
paper  we  provided  a  justification  for  a  normalization  fac¬ 
tor  which  would  permit  us  to  assign  a  saliency  ranking 
to  competing  critpts,  regardless  of  the  scale  at  which 
they  were  originally  detected.  Thus,  competition,  both 
within  and  across  different  scales  is  now  treated  in  a 
uniform  manner.  In  the  following  subsection  we  discuss 
some  of  the  specific  problems  that  must  be  resolved  in 
competition  resolution,  and  the  algorithmic  procedures 
we  invoke  to  deal  with  these  problems. 

6.1  Mechanisms  for  Filtering  Competing 
Critpts 

One  of  the  algorithmic  mechanisms  we  devised  to  deal 
with  the  above  problems  (described  in  greater  detail  in 
Appendix  B)  is  to  construct  an  array  with  one  slot  for 
each  indexed  location  along  the  curve  (conceptually  two 
such  arrays,  one  each  respectively  for  positive  and  negar- 
tive  saliency  scores).  Each  slot  is  either  free  or  "owned” 
by  exactly  one  critpt.  A  critpt  occupies  only  one  of  the 
slots  it  owns  -  this  occupied  slot  corresponds  to  its  actual 
location  along  the  curve.  A  "new”  critpt,  contending 
for  a  slot,  must  have  a  normalized  score  greater  than  the 

*For  an  open  curve  aeginent,  the  aMigniaeat  of  poeitive  vs.  nef- 
stive  is  erbitrarjr;  the  important  :ansideration  is  that  we  use  the 
infbnnation  about  the  direction  of  deviation  of  the  curve  from  the 
stick  to  separate  detected  critpts  into  the  two  passible  categories 
whidi  are  then  processed  separately. 

AH  the  potential  critpts  are  detected,  sorted,  and  then  entered 
into  the  array  in  increasing  order  of  saliency  to  avoid  sequence 
dependent  eActs. 


existing  value  stored  in  the  slot  to  capture  it.  If  a  new 
critpt  captures  a  slot  occupied  by  (as  opposed  to  sim¬ 
ply  being  owned  by)  a  previously  dominant  critpt,  all  of 
the  slots  of  the  now  dominated  critpt  are  also  captured. 
This  mechanism  provides  a  way  of  avoiding  the  need  to 
choose  a  fixed-sized  "base  of  support”  for  a  critpt. 

7.  Algorithm  Performance 

The  algorithm  discussed  in  the  previous  sections  of 
this  paper,  and  described  in  Appendix  B,  has  been  com¬ 
pared  with  human  performance  (Figure  1),  and  has  been 
run  on  hundreds  of  randomly  generated  images  (as  de¬ 
scribed  in  Appendix  A)  without  making  any  obvious  er¬ 
rors.  In  all  these  cases  the  same  set  of  parameters  were 
used  with  no  operator  involvement.  Figure  5  shows  40 
consecutively  generated  random  curves  and  the  critpts 
selected  by  the  algorithm.  Figure  6  in  Appendix  C  shows 
results  of  the  algorithm  run  on  curves  extracted  from  real 
images. 

S.  Discussion 

Curve  partitioning  is  an  active  research  area  which 
not  only  is  of  theoretical  interest  as  a  basic  element  in 
pictorial  description  (e.g.,  Attneave,  B''ngtsson  and  Ek- 
lundh,  Hoffman  and  Richards),  and  for  providing  insight 
into  the  partitioning  problem  in  general  (e.g.,  Fischler 
and  Bolles),  but  has  many  potential  applications.  Some 
of  the  more  immediate  ones  include:  data  compression 
by  using  critpts  as  the  basis  for  regenerating  a  curve 
by  straight  line  or  spline  interpolation  (e.g.  Imai  and 
Iri,  Teh  and  Chin),  matching/recognition  using  critpts 
and/or  the  partitioned  curve  segments  (e.g.,  Mokhtar- 
ian  and  Mackworth,  Wuescher  and  Boyer),  and  as  a  key 
component  of  an  interface  for  man-machine  communi¬ 
cation  about  pictorial  objects  (the  ability  to  point  at 
icons  representing  symbolic  objects  has  revolutionized 
the  computer-user  interface;  to  extend  this  capability, 
one  would  like  to  be  able  to  point  to  a  location  in  an 
image  and  have  the  machine  be  able  to  deduce  the  com¬ 
ponent  being  referred  to  -  image  partitioning  in  gen¬ 
eral,  and  especially  curve  partitioning,  are  critical  to  this 
goal). 

In  this  paper  we  have  focused  on  one  specific  aspect 
of  the  curve  partitioning  problem:  Duplicating  human 
performance  in  the  selection  of  a  small  number  of  points 
(cadled  crifpts)  along  a  curve  segment  which  could  be 
used  as  the  basis  for  reconstructing  the  curve  at  some 
futiue  time.  While  there  will  generally  be  a  significant 
degree  of  overlap  in  the  points  selected  by  the  tech¬ 
niques  referenced  above  (focused  on  different  applica¬ 
tions),  there  are  also  significant  differences.  There  has 
been  very  little  recent  work  on  the  generic  problem  of 
choosing  psychologically  salient  points  with  which  to  di¬ 
rectly  compare  our  results.  On  the  other  hand,  we  have 
conducted  a  relatively  large  number  of  experiments  with 


uniformly  good  results  (e.g.,  see  Figure  5). 

There  are  two  major  paradigms  underlying  the  pub¬ 
lished  work  on  partitioning  planar  curves.  The  first  in¬ 
volves  obtaining  a  mathematically  differentiable  repre¬ 
sentation  of  the  given  digital  curve  by  the  use  of  splin- 
ing  or  Gaussian  convolution  (e.g.,  Mokhtarian86).  This 
gives  good  results  for  many  applications,  but  the  salient 
points  on  the  smoothed  curve  are  often  displaced  from 
their  original  locations  (or  eliminated).  This  paradigm 
is  not  suitable  for  our  purposes  in  this  paper. 

The  second  paradigm,  which  includes  the  work  de¬ 
scribed  here,  is  to  first  measure  some  approximation  to 
the  curvature  at  each  point  on  a  curve.  This  usually  in¬ 
volves  choosing,  or  finding,  an  appropriate  scale  at  which 
to  make  the  curvature  measurement.  This  is  typically  ac¬ 
complished  by  making  the  curvature  measurement  over 
increasingly  larger  curve  segments  (centered  on  the  curve 
point  being  evaluated)  until  either  the  computed  curva^ 
ture  at  the  point,  or  some  related  quantity,  reaches  a  lo¬ 
cal  extrema.  Each  point  is  assigned  a  saliency /criticality 
value  (its  estimated  curvature)  and  an  interval  length 
along  the  curve  centered  on  the  point  (called  its  re¬ 
gion  of  support).  The  region  of  support  is  then  used  for 
non-maximum  suppression  -  each  point  suppresses  other 
points  with  lower  criticality  scores  falling  in  its  region  of 
support. 

Major  differences  between  our  approach  and  other 
work  under  this  second  paradigm  include: 

•  A  generic  saliency  measure  which  often  selects 
points  corresponding  to  local  curvature  extrema, 
but  which  in  many  situations  is  in  better  accord 
with  human  selection  preference  and  placement  ac¬ 
curacy. 

•  A  distinct  approach  to  the  problem  of  dealing  with 
curve  features  salient  at  different  scales.  The  con¬ 
ventional  approach  is  to  associate  a  single  scale  with 
each  curve  point  which  in  turn  defines  a  fixed  re¬ 
gion  of  support  to  be  used  for  non-maximum  sup¬ 
pression.  In  our  approach,  we  measure  the  saliency 
of  each  curve  point  at  a  number  of  different  scales, 
and  have  developed  procedures  for  allowing  poten¬ 
tial  eritpts,  found  at  different  scales  and  spatial  loca¬ 
tions  to  compete  with  each  other.  This  competi¬ 
tion  is  not  restricted  to  any  fixed  extent  of  the  ciuve 
(which  thus  avoids  anomalous  selections  caused  by 
an  important  event  occurring  just  beyond  the  fixed 
limit  of  search,  i.e.,  the  horizon  effect). 

Additional  approaches  are  available  for  partitionias  1-D 
curves;  for  example,  see  Fischler  and  Wolf  [Flachler83]  or  Wtldn 
[WitldnSS].  As  noted  in  Appendix  B,  the  1-D  partitioning  tech- 
niquein  the  FischlerSS  reference  is  used  as  a  component  of  the  SSS 
algorithm. 

‘^It  is  interesting  to  note  that  we  have  not  found  a  use  for  coop¬ 
erative  reinforcement  -  cooperation  appears  to  be  a  global  relation. 
Competition  is  important  at  the  local  level  (e.g.,  lateral  inhibition) 


Our  approach  to  local  saliency  selection  can  be  con¬ 
sidered  a  form  of  automated  preattentive  perception. 
Potential  extensions  could  include  dealing  with  more 
global  curve  features,  such  as  recognizing  the  intersec¬ 
tion  of  extended  straight  line  segments,  or  transition 
points  between  analytic  curves  with  different  parame¬ 
ters,  or  global  symmetries  and  repeated  structure.  Rec¬ 
ognizing  these  more  global  structures,  and  ranking  them 
with  respect  to  human  perceived  saliency,  may  well  fall 
outside  the  competence  of  the  basic  approach  described 
in  this  paper. 

9.  Appendices 

9.1  Appendix  A:  Generation  of  Random 
Curves 

The  following  method  was  used  to  construct  the  ran¬ 
dom  curves  used  in  the  experiments  described  in  the 
body  of  this  paper. 

(1)  Thirty  (x,y)  pairs  are  generated  for  each  curve. 
Each  value  of  x  and  y  ate  generated  by  a  uniform- 
distribution  (0-1)  random-number  generator  and  then 
multiplied  by  100  to  produce  numbers  (coordinate- 
values)  uniformly  distributed  between  0  and  100. 

(2)  The  thirty  points  are  next  linked  by  a  minimal- 
spanning-tree  (MST). 

(3)  A  diameter  path  is  extracted  from  the  MST,  and 
the  ordered  subset  of  the  original  randomly  generated 
points  that  fall  along  this  diameter  path  are  the  input 
sequence  provided  to  a  spline-fitting  routine  [Cline74] 
which  returns  a  continuous  curve  represented  by  a  se¬ 
quence  of  (x,y)  coordinate  pairs.  These  sequences,  typ¬ 
ically  containing  on  the  order  of  150-250  points,  are  the 
random  curves  used  in  our  experiments. 

9.2  Appendix  B:  An  Algorithm  For 
Computing  Curve-Point  Criticality 

The  partitioning  algorithm  described  in  Fischler  and 
Bolles  [Fischler86]  has  been  modified  and  extended  as 
summarized  below. 

The  algorithm  collects  candidates  (peaks)  for  the  crit¬ 
ical  points  of  a  ciuve  by  examining  the  deviation  of  the 
points  of  the  curve  from  a  chord  or  "stick”  that  is  it¬ 
eratively  advanced  along  the  curve.  Sticks  of  different 
lengths  are  used  to  find  critical  points  that  ace  salient 
at  different  "natiual”  scales  on  the  given  curve.  (Except 
when  explicitly  stated  otherwise,  two  sticks  were  used 
for  all  the  experiments  discussed  in  this  paper;  one  of 
length  10  pixels  and  the  other  of  length  20  pixels.)  The 
algorithm  provides  the  option  of  using  arc-length  along 
the  curve,  or  the  euclidean  length  of  the  stick,  to  de¬ 
termine  the  separation  of  the  endpoints  of  the  stick  on 
the  curve;  we  used  the  euclidean  length  of  the  stick  for 
all  of  the  experiments  discussed  in  this  paper.  One  end 
of  the  stick  is  advanced  along  the  curve,  one  pixel  at  a 
time,  and  the  other  end  is  placed  at  the  first  (sequential) 


position  further  along  the  curve  for  which  the  Euclidean 
distance  equals  or  exceeds  the  specified  stick  length. 

For  each  placement  of  the  stick,  an  accumulator  asso¬ 
ciated  with  the  curve-point  (in  the  interval  of  the  curve 
between  the  two  endpoints  of  the  stick)  of  maximum 
deviation  from  the  stick  is  incremented  by  the  absolute 
value  of  the  distance  from  the  point  to  the  stick  if  this 
distance  exceeds  a  predefined  noise  threshold.  However, 
for  the  given  stick  placement,  if  there  is  more  than  one 
excursion  (exit  and  return)  outside  the  noise  region,  the 
underlying  model  is  violated  and  the  accumulators  are 
not  incremented.  (The  noise  threshold  was  uniformly 
set  to  20  percent  of  stick  length;  thus  a  euclidean  devi¬ 
ation  of  more  than  2  ”  pixels”  from  a  stick  of  length  10 
was  required  to  cause  any  modification  of  the  associated 
accumulator.) 

To  deal  with  direction  dependent  effects,  a  complete 
traverse  is  made  in  both  directions  along  the  curve  sum¬ 
ming  the  results  in  the  same  accumulators.  The  points 
which  have  locally  maximum  scores  in  the  accumulators 
(called  peaks)  for  any  of  a  given  set  of  sticks  are  the 
points  from  which  the  critical  points  will  be  selected. 

The  following  information  is  collected  for  each  peak 
and  used  to  find  the  critical  points: 

•  INDEX:  the  sequence  number  along  the  curve  of  the 
point  at  which  the  peak  was  located. 

•  STICK:  the  length  of  the  stick  (in  pixels)  used  to 
find  the  peak. 

•  DEV:  the  sign  of  the  deviation  of  the  peak  with 
respect  to  the  curve. 

•  NSCORE:  the  "normalized”  score  which  is  the  score 
in  the  accumulator  for  the  peak  divided  by  the 
square  of  the  stick  length. 

The  peaks  are  divided  into  two  groups  with  like-signed 
deviation  DEV.  The  critical  points  for  the  two  groups 
are  found  independently  of  each  other  and  their  union  is 
returned  as  the  set  of  critical  points  for  the  curve. 

In  finding  the  critical  points,  we  stipulate  that  each 
peak’s  score  has  a  region  of  support,  p/ss  and  minus  half 
its  associated  stick  length,  on  each  side  of  its  position 
along  the  curve.  An  array  (the  support  array)  equal  to 
the  length  of  the  curve  is  used  to  store  the  support  in¬ 
formation.  The  support  information  for  a  peak  is  a  list 
(NSCORE  INDEX  STICK).  For  each  peak,  the  support 
information  may  be  entered  at  every  index  location  cov¬ 
ered  by  the  region  of  support  depending  on  what  was 
previously  stored  in  the  location. 

For  all  locations  in  the  support  region  for  the  new 
peak  (in  the  support  array),  an  entry  at  J  is  replaced  by 
the  information  for  the  new  peak  if  there  is  no  previous 
entry  in  the  array  or  if  the  score  for  the  new  peak  is 
>  than  the  score  in  the  existing  entry  in  the  array.  In 
addition,  if  the  entry  J  is  being  replaced,  and  J  is  also 


the  INDEX  for  a  peak  that  was  entered  previously,  the 
support  information  for  the  new  peak  replaces  the  sup¬ 
port  information  of  the  old  peak  wherever  it  occurs  in 
the  support  array  (i.e.  even  outside  of  the  new  peak’s 
original  support  region). 

After  the  above  processing,  the  critical  points  for  the 
curve  are  designated  as  those  points  whose  index  into  the 
support  array  equab  the  index  stored  in  the  information 
list  of  the  array  element. 

It  can  be  seen  that  the  order  in  which  peaks  are  en¬ 
tered  into  the  support  array  can  affect  the  final  selection 
of  the  critical  points  because  a  peak’s  region  of  support 
can  be  altered  by  the  “capture”  process,  and  thus  de¬ 
pends  on  the  state  of  the  support  array  at  the  time  the 
peak  is  entered.  In  our  implementation  of  the  algorithm 
for  running  the  experiments,  we  entered  the  peaks  into 
the  support  array  as  soon  as  they  were  computed  in  or¬ 
der  to  gain  computational  efficiency  and  simplicity,  and 
still  obtained  excellent  results.  In  the  current  veraion  of 
the  algorithm  we  collect  all  the  peaks  for  all  the  sticks, 
sort  the  peaks  by  their  normalized  scores,  and  then  enter 
them  into  the  support  array  in  order  of  increasing  score. 

There  are  some  additional  aspects  of  the  algorithm 
that  are  further  discussed  in  the  more  complete  version 
of  this  paper,  including  ways  to  handle  problems  aso- 
ciated  with  very  sharp  angles  and  competing  critpts  of 
approximately  equal  saliency  scores, 

9.3  Appendix  C:  Partitioning  Curves 
Extracted  £Vom  Aerial  Imagery 

A  technique  for  detecting  and  delineating  low  resolu¬ 
tion  linear  structures  appearing  in  aerial  imagery,  such  as 
roads  and  rivers,  was  described  by  the  authors  of  this  pa¬ 
per  in  an  earlier  publication  [FischlerSS].  The  algorithm 
was  effective  in  finding  such  structure,  but  it  provided  no 
mechanism  for  distinguishing  between  the  semantically 
meaningful  objects  and  the  “accidental”  and  irrelevant 
linear  features  found  in  most  real  images.  In  work  now  in 
progress,  we  use  the  SSS  algorithm  to  “slice  up”  the  in¬ 
dividual  curves  found  by  the  delineation  algorithm.  We 
throw  away  the  very  small  resulting  segments  which  are 
typical  of  accidental  linear  formations,  and  then  further 
filter  the  longer  segments  with  respect  to  a  set  of  seman¬ 
tic  constraints.  Those  segments  that  pass  through  the 
filtering  process  are  then  “glued”  back  together  to  pro¬ 
duce  the  desired  delineation.  This  process  is  illustrated 
in  Figure  6.  Figure  6a  shows  an  aerial  image,  and  6b 
shows  the  linear  segments  extracted  by  use  of  the  orig¬ 
inal  delineation  algorithm.  Figure  6c  shows  those  seg¬ 
ments  that  passed  through  the  filters  mentioned  above, 
and  Figure  6d  shows  the  result  of  a  final  step  to  retain 
only  the  more  significant  roads  and  trails.  The  two  panes 
of  Figure  fie  show  the  results  of  applying  the  SSS  algo¬ 
rithm  to  some  of  the  120  curves  highlighted  in  Figure 
fib  (they  have  been  isolated  and  separated  into  the  two 
panes  to  allow  clear  display  of  the  partition  points  and 


to  prevent  confusion  due  to  the  intersections  of  distinct 
curves).  The  robustness  of  the  SSS  algoritmn  is  essential 
in  carrying  out  the  filtering  operation.  Insertion  of  extra¬ 
neous  partition  points  would  cause  the  lod^of  portions 
(rf  the  road  network;  absence  of  valid  partition  points 
would  allow  meanii^ess  appendages  to  become  part  of 
the  extracted  network. 

10.  References 

1.  F.  Attneave,  ”Some  informational  aspects  of  visual 
perception,  "Psychol.  Rev.  61:183-193, 1954. 

2.  A.  Baigtsson  and  J.O.  Eklundh,  "Shape  Repre¬ 
sentation  by  Multiscale  Contour  Approximation,” 
IEEE  Tkans  PAMI-13(l):8S-93,  Jan.  1991. 

3.  A.K.  Cline,  "Scalar-  and  planar-  valued  curve  fitting 
using  splines  under  tension,”  CACM  17(4):218-223, 
April  1974. 

4.  L.S.  Davis,  "Understanding  shape:  angles  and 
sides,”  lEEjs  Trans.  Comput.  026:236-242,  March 
1977. 

5.  M.A.  Fischler  and  R.C.  Bolles,  "Perceptual  organi¬ 
sation  and  curve  partitioning,”  IEEE  TVans  PAMI- 
8(1):100-10S,  Jan.  1986. 

6.  M.A.  Fischler  and  H.C.  Wolf,  "Linear  Delineation,” 
Proceedings  IEEE  CVPR-83,  June  1983,  pp  351- 
356;  also,  Readings  in  Computer  Vision  (M.A.  Fis¬ 
chler  and  O.  fitschein,  eds.),  Morgan  Kaufmann,  pp 
204-209, 1987. 

7.  MA.  Fischler  and  P.  Barrett,  "An  iconic  transform 
for  sketch  completion  and  shape  abstraction,”  CGIP 
13:334-360, 1980. 

8.  D.  Hilbert  and  S.  Cohen- Voesen,  "Geometry  and  the 
imaginatitxi.”  Chelsea,  1952. 

9.  D.D.  Hoffinan  and  W.A.  Richards,  "Representing 
smooth  plane  curves  for  recognition:  implications 
for  figure-ground  reversal,”  Proc.  2nd  Nat.  Conf. 
Artificial  Intelligence,  Pittsburg,  PA,  pp  5-8,  Aug. 
1982. 

10.  H.  Imai  and  M.  Iri,  "Computational-geometric 
methods  for  polygonal  approximations  of  a  curve,” 
CVGIP-36(l):31-34,  Oct.  1986. 

11.  D.G.  Lowe,  "Organization  of  smooth  image  curves 
at  multiple  scales,”  Proc  2nd  ICCV,  pp.  558-567, 
1988. 

12.  R.  Mehrotra,  S.  Nichani,  and  N. 
Ranganathan,  "Corner  detection,"  Pattern  Recog¬ 
nition  23(11):1223-1233, 1990. 


13.  F.  Mokhtarian  and  A.  Mackworth,  "Scale-based  de¬ 
scription  and  recognition  of  planar  curves  and  two- 
dimensional  shapes,”  IEEE  PAMI  8(l):34-43,  Jan 
1986. 

14.  T.  Pavlidis  and  S.L.  Horowitz,  "Segmentation  of 
plane  curves,”  IEEE  'Dans.  Comput.  C-23:860-870, 
Aug.  1974. 

15.  W.  Richards  and  D.  Hol&nan,  "Codon  constraints 
on  closed  2D  shapes,”  in  Human  and  Machine  Vi¬ 
sion  n  (A.  Rosenfeld,  ed.).  Academic  Press,  pp  207- 
223,  1986. 

16.  W.  Richards,  B.  Dawson,,  and  D.  Whittington,  ”J. 
Optical  Soc.  Amer.  3(9):1483-1491,  Sept.  1986. 

17.  A.  Rosenfeld  and  E.  Johnston,  "Angle  detection  in 
digital  curves,”  IEEE  Dans.  Comput.  022:875- 
878,  1973. 

18.  A.  Rosenfeld  and  J.S.  Weszka,  "An  improved 
method  of  angle  detection  on  digital  curves.”  IEEE 
Dans.  Comput.  024:940-941,  Sept.  1975. 

19.  C.H.  Teh  and  R.T.  Chin,  "On  the  detection  of  dom¬ 
inant  points  on  digital  curves,”  IEEE  Dans  PAMI- 
ll(8):859-872,Aug.  1989. 

20.  A.  Witldn,  "Scale  Space  Filtering,”  Proc.  8th  U- 
CAI,  Karlsruhe,  West  Germany,  pp  1019-1022,  Aug. 
1983. 

21.  D.M.  Wuescher  and  K.L.  Boyer,  "Robust  contour 
decomposition  using  a  constant  curvature  crite¬ 
rion,”  IEEE  Dans  PAMI-13(1):41-51  Tan.  1991. 


CURVE  PARTITIONING:  Instructions 
For  each  enclosed  curve: 


Assiune  that  10  years  hrom  now  you  will  be  asked  to  reconstruct  the  given  curve. 
A  reasonably  correct  reconstruction  will  be  rewarded  by  a  large  sum  of  money  (say 
$5000).  You  can  record,  for  later  use,  the  locations  of  up  to  nine  points  along  the 
curve  to  help  you  do  the  reconstruction  -  but  it  will  cost  you  $200  for  each  such 
point  (to  be  subtracted  from  your  prize  if  you  receive  the  reward).  Please  mark  your 
selected  points  on  the  curve.  Do  not  select  the  endpoints,  they  will  be  provided  free. 
Do  not  take  more  than  one  minute  per  curve. 


Critical  points  found  by  the  SSS  algorithm 


Points  chosen  by  at  least  1  of  11  test  subjects 


Figure  1:  Comparison  of  human  and  SSS  algorithm  performance  in  the  curve 
partitioning  task.  (Each  of  the  curves  used  in  the  experiments  with  human 
subjects  was  contained  in  a  square  that  was  1.5  inches  on  a  side.) 


(a)  Test  curve  189 


(b)  SSS  selected  critpts 


(c)  R/J-curvature 
selected  critpts 


(d)  Anomalous  area 
(magnified) 


(e)  Plot  of  R/J-curvature  along  test  curve.  Abcissa  =s  sequence  number  of  point  on  curve. 
Ordinate  ^  angle  (in  degrees)  computed  at  point.  (Angle-arms  are  10  units  each  for 
R/J-C;  standard  stick  lengths  of  10  and  20  units  are  employed  by  SSS.) 


Figure  2:  Comparison  of  SSS  and  R/J-curvature  metrics  evaluated  on  test  curve  189. 
The  continuous  curve  in  (e)  represents  R/J-curvature  along  the  test  curve  shown  in  (a). 
The  vertical  lines  in  (e)  mark  the  sequentially  numbered  critpts  selected  by  SSS  as  shown 
in  (b).  The  critpts  corresponding  to  the  extreme  values  of  R/J-curvature  shown  in  (c) 
are  marked  as  circles  in  (e).  The  arrow  in  (c),  and  in  the  corresponding  location  in  (e), 
illustrates  an  anomalous  selection  using  R/J-curvature.  (d)  shows  the  computed  values  of 
R/J-curvature,  153*,  at  the  preferred  location  and  122*  at  the  location  of  the  anomalous 
selection. 


(&)  Test  CTirve  166  (b)  SSS  selected  critpts  (c)  R/J-curvature 

selected  critpts 


(d)  Plot  of  R/J-curvature  along  test  curve.  Abcissa  =  sequence  number  of  point  on  curve. 
Ordinate  s:  angle  (in  degrees)  computed  at  point.  (Angle-arms  are  10  units  each  for 
R/J-C;  stick  length  is  20  units  for  F/B-S.) 


Figure  3:  Comparison  of  SSS  and  R/J-curvature  metrics  evaluated  on  test  curve  166. 
The  continuous  curve  in  (d)  represents  R/J-curvature  along  the  test  curve  shown  in  (a). 
The  vertical  lines  in  (d)  mark  the  sequentially  numbered  critpts  selected  by  SSS  as  shown 
in  (b).  The  critpts  corresponding  to  the  extreme  values  of  R/J-curvature  shovm  in  (c) 
are  marked  as  circles  in  (d).  The  arrows  in  (c),  and  in  the  corresponding  locations  in 
(d),  illustrates  anomalous  selections  using  R/J-curvature. 


Figure  4:  Curvature  and  saliency  are  functions  of  curve  resolution.  As  illustrated  in  (a) 
above,  we  can  draw  more  than  one  visually  acceptable  tangent  to  many  of  the  p<wts  on  this 
curve  at  the  given  resolution.  As  resolution  increases,  tangent  2  would  dominate  at  point  x; 
as  resolution  decreases,  tangent  1  would  dominate  at  the  same  point.  In  (b),  the  angle  at  x 
can  be  seen  as  45*  at  one  scale  and  90*  at  a  larger  scale.  Thus,  curvature  and  saliency  are 
not  unique  properties  of  cu^e  points. 


2?; 

u 

ij 

m 

CS 

mm 

< 

mm 

9 

? 

c 

Figure  5:  Critical  points  found  by  the  SSS  algorithm  for  a  set  of  40  random  curves. 


(a)  Aerial  photograph 


(b)  Initial  extraction  of 
linear  structure 


(c)  Filtered  linear  structure  (d)  Delineation  of  major 
using  SSS  algorithm  roads  and  trails 


Figure  6;  Application  of  the  SSS  algorithm  to  the  problem  of  delineating  linear  features 
aerial  photographs. 


Appendix  C: 

Ob|eet- Centered  Surface  Reconstruction: 
Cdmiilnlng  Multi-Image  Stereo  and  Shading 


OAKPA  lU  WORKSHOP  1993 


Object- Centered  Surface  Reconstruction: 
Combining  Multi-Image  Stereo  and  Shading* 


P.  FSia  and  Y.  Leclerc 
SRI  International 

333  Ravenswood  Avenue,  Menlo  Park,  CA  94025 
(fua@ai.8ri.com  leclerc@ai.sri.com) 


Abstract 

Our  goal  is  to  reconstruct  both  the  shape  and  re¬ 
flectance  properties  of  surfaces  from  multiple  images. 
We  argue  that  an  object-centered  representation  is 
most  appropriate  for  this  purpose  because  it  natu¬ 
rally  accomodates  multiple  sources  of  data,  multiple 
images  (including  motion  sequences  of  a  rigid  ob¬ 
ject),  and  selfoccluskms.  We  then  present  a  spe¬ 
cific  object-centered  reconstruction  method  and  its 
implementation.  The  method  begins  with  an  ini¬ 
tial  estimate  of  surface  shape  (provided  by  trian¬ 
gulating  the  result  of  conventional  stereo  or  other 
means).  The  surface  shape  and  reflectance  prop¬ 
erties  are  then  iteratively  adjusted  to  minimise  an 
objective  fimction  that  combines  information  from 
multiple  input  images.  The  objective  function  is  a 
weighted  sum  of  "stereo,"  shading,  and  smoothness 
components,  where  the  weight  varies  over  the  sur¬ 
face.  For  example,  the  stereo  component  is  weighted 
more  strongly  where  the  surface  projects  onto  highly 
textured  areas  in  the  images,  and  less  strongly  oth¬ 
erwise.  Thus,  each  compmient  has  its  greatest  in¬ 
fluence  where  its  accuracy  is  likely  to  be  greatest. 
Experimental  results  on  both  synthetic  and  real  im¬ 
ages  are  presented. 

1  Introduction 

The  problem  of  recovering  the  shape  and  re¬ 
flectance  properties  of  a  surface  from  multi¬ 
ple  images  has  received  considerable  attention 
[6, 20, 35, 44].  This  is  a  key  problem  not  only  in 

‘Support  for  this  research  was  provided  in  part  by 
vations  contracts  from  the  Advanced  Researdi  Projects 
Agency. 


developing  general-purpose  vision  systems,  but 
also  in  specialized  areas  such  as  the  generation 
of  Digital  Elevation  Models  from  aerial  images 
(5,  12,  26,  53). 

In  this  paper,  we  view  the  recovery  problem 
as  one  of  finding  an  object-centered  description 
of  a  surface  from  a  set  of  input  images  that  is 
sufficiently  complete,  in  terms  of  its  geometric 
and  radiometric  properties,  that  it  is  possible 
to  generate  an  image  of  the  surface  from  any 
viewpoint.  In  particular,  the  description  should 
be  sufficiently  complete  to  reproduce  the  input 
images  to  within  a  certain  tolerance,  given  mod- 
ds  of  the  cameras,  thdr  relative  locations,  and 
expected  noise. 

Our  surface  reconstruction  method  uses  an 
object-centered  representation,  specifically,  a 
triangulated  3-D  mesh  of  vertices.  Such  a  rep¬ 
resentation  accommodates  the  two  classes  of 
information  mentioned  above,  as  well  as  mul¬ 
tiple  images  (including  motion  sequences  of 
a  rigid  object)  and  self-occlusions.  We  have 
chosen  to  model  the  8nrfaM:e  material  using 
the  Lambertian  reflectance  modd  with  variable 
albedo,  though  generalizations  to  specular  sur¬ 
faces  would  be  rdativdy  straightforward.  Con¬ 
sequently,  the  natural  choice  for  the  monocular 
information  source  is  shading,  while  intensity  is 
the  natural  choice  for  the  image  feature  used 
in  multi-image  correspondence.  Not  only  are 
these  the  natural  choices  given  a  Lambertian 
reflectance  modd,  they  are  also  complementary 
[7,  30):  intensity  corrdation  is  most  accurate 
wherever  the  input  images  are  highly  textured, 
whereas  shading  is  most  accurate  when  the  in¬ 
put  images  are  untextured. 


1 


The  reconstruction  method  is  to  minimize 
an  objective  function  whose  components  de¬ 
pend  on  the  input  images  and  some  measure  of 
the  complexity  of  the  3-D  mesh.  The  method 
staurts  with  an  initial  estimate  for  the  mesh 
derived  from  the  triangulation  of  conventional 
stereo  results,  and  uses  a  standard  optimization 
technique  called  conjugate  gradient  descent  to 
minimize  the  objective  function.  The  image- 
dependent  components  of  the  objective  func¬ 
tion  are  related  to  the  two  sources  of  informa¬ 
tion  mentioned  above.  We  take  advantage  of 
the  complementary  nature  of  the  information 
sources  by  weighting  the  components  at  each 
facet  of  the  triangulated  mesh  according  to  the 
amount  of  texturing  within  the  area  of  the  im¬ 
ages  that  the  facet  projects  to.  The  projection 
uses  a  hidden-surface  algorithm  to  take  occlu¬ 
sions  into  account. 

In  the  following  section,  we  describe  related 
work  and  our  contributions  in  this  area.  Fol¬ 
lowing  this  we  discuss  some  of  the  key  issues 
in  multi-image  surface  reconstruction  and  how 
to  combine  different  sources  of  information  for 
such  purposes.  We  then  describe  in  detail  our 
specific  procedure,  discuss  the  behavior  of  our 
procedure  on  synthetic  data,  and  show  some  re¬ 
sults  on  real  images. 

2  Related  Work  and  Contri¬ 
butions 

Three-dimensional  reconstruction  of  visible  sur¬ 
faces  continues  to  be  an  important  goal 
of  the  computer  vision  research  community. 
Initially,  much  of  the  work  concentrated 
on  2|-dimensional  image-centered  reconstruc¬ 
tions,  such  as  Barrow  and  Tenenbaum’s  Intrin¬ 
sic  Images  [6]  and  Marr’s  2|-D  Sketch  [35]. 
These  preliminary  ideas  have  been  the  basis  for 
quite  successful  systems  for  recovering  shape 
and  surface  properties.  Some  have  used  sin¬ 
gle  sources  of  information,  such  as  sequences  of 
range  data  or  intensity  im^es  [3,  25],  stereo 
[12,  26,  52,  53],  and  shading  [21,  24,  44].  Oth¬ 
ers  have  combined  sources  of  information,  such 
as  shading  and  texture  [8],  features  and  stereo 
[23],  focus,  vergence,  stereo,  and  camera  cali¬ 
bration  [1].  See  [2]  for  further  discussions  on 


information  fusion. 

More  recently,  full  3-dimensional  models  have 
been  used,  such  as  3-D  surface  meshes  [46,  49], 
parameterized  surfaces  [40, 33],  particle  systems 
[42,  17],  and  volumetric  models  [36,  45,  37]. 

As  with  the  2  ^-dimensional  representations, 
3-D  representations  have  used  a  variety  of  sin¬ 
gle  image  cues  for  reconstruction,  such  as  sil¬ 
houettes  and  image  features  [9,  11,  47,  48,  50], 
range  data  [51],  stereo  [17],  and  motion  [41]. 
Liedtke[32]  first  uses  silhouettes  to  derive  an 
initial  estimate  of  the  surface,  and  then  uses 
a  multi-image  stereo  algorithm  to  improve  on 
the  result.  Their  approach  to  deriving  an  ini¬ 
tial  estimate  for  the  mesh,  as  with  Szeliski  and 
Tonneson’s  approach  [42],  is  significantly  more 
powerful  than  the  one  we  use  in  this  paper.  This 
is  an  important  topic  for  future  research. 

Of  special  relevance  to  this  paper  is  research 
in  combining  stereo  and  shape  from  shading. 
Using  2|-dimensional  representations,  Blake  et 
al.  [7]  is  the  earliest  reference  we  are  aware 
of  that  discusses  the  complementary  nature  of 
stereo  and  shape  from  shading,  but  their  exper¬ 
imental  results  are  almost  non-existent  in  this 
paper.  Leclerc  and  Bobick  [31]  discuss  the  in¬ 
tegration  of  stereo  and  shape  from  shading,  but 
only  use  stereo  as  an  initial  condition  to  a  dis¬ 
crete  height  from  shading  algorithm.  Cryer  et 
al.  [10]  combine  the  high-frequency  information 
from  a  shape  from  shading  algorithm  with  the 
low-frequency  information  from  a  stereo  algo¬ 
rithm  using  filters  designed  to  match  those  in 
the  human  visual  system. 

Using  fuU  3-D  representations,  Heipke  [22] 
integrates  the  two  cues,  but  assumes  that  the 
images  can  be  separated  beforehand  into  zones 
of  variable  albedo  (where  one  does  stereo)  and 
areas  of  constant  albedo  (where  one  does  shape 
from  shading).  This  is  in  contrast  to  our  ap¬ 
proach,  in  which  the  optimization  procedure  dy¬ 
namically  adapts  to  the  image  data. 

In  this  paper,  we  unify  the  idea  of  using  3-D 
meshes  to  integrate  information  from  multiple 
images  with  that  of  using  multiple  cues.  Our 
specific  approach  to  this  unification,  has  led  to 
a  number  of  important  contributions: 

•  We  correctly  deal  with  occlusions  by  using 
a  hidden  surface  algorithm  during  the  re- 


construction  process. 

•  Our  technique  for  doing  stereo  avoids  the 
constant  depth  assumption  of  traditional 
correlation-based  stereo  algorithms,  effec¬ 
tively  using  variable-sized  windows  in  the 
images. 

•  Our  approach  to  shape  from  shading  is 
applicable  to  surfaces  with  slowly  varying 
albedo.  This  is  a  significant  advance  o'  er 
traditional  approaches  that  require  con¬ 
stant  albedo. 

•  We  propose  a  dynamic  weighting  scheme 
for  combining  shape  from  shading  and 
stereo,  and  demonstrate  that  it  leads  to  sig¬ 
nificantly  better  results  than  using  either 
cue  alone  using  both  synthetic  and  real  im¬ 
ages. 

To  demonstrate  the  validity  of  the  overall  ap¬ 
proach,  we  have  implemented  a  computation¬ 
ally  effective  optimization  procedure,  and  have 
demonstrated  that  it  finds  good  minima  of  the 
objective  function  on  both  synthetic  and  real 
images. 

3  Issues  in  Multi-Image  Sur¬ 
face  Reconstruction 

In  this  section,  we  briefly  discuss  some  of  the  key 
issues  in  multi-image  surface  reconstructions, 
and  outline  how  we  address  the  issues  in  this 
paper.  These  outlines  will  be  expanded  upon  in 
Section  4. 

3.1  Surface  Shape  and  its  Represen¬ 
tation 

Since  the  task  is  to  reconstruct  a  surface  from 
multiple  images  whose  vantage  points  may  be 
very  different,  we  need  a  surface  representation 
that  can  be  used  to  generate  images  of  the  sur¬ 
face  from  arbitrary  viewpoints,  taking  into  ac¬ 
count  self-occlusion,  self-shadowing,  and  other 
viewpoint-dependent  effects.  Clearly,  a  single 
ims^e-centered  representation,  such  as  a  depth 
map,  is  inadequate  for  this  purpose.  Instead, 
an  object-centered  surface  representation  is  re¬ 
quired. 


There  are  many  object-centered  surface  rep¬ 
resentations  that  are  possible.  However,  there 
are  some  practical  issues  that  are  important  in 
choosing  an  appropriate  one.  First,  the  repre¬ 
sentation  should  be  general-purpose  in  the  sense 
that  it  should  be  possible  to  represent  an}  con¬ 
tinuous  surface,  closed  or  open,  and  of  arbitrary 
genus.  Second,  it  should  be  relatively  straight¬ 
forward  to  generate  an  instance  of  a  surface 
from  standard  data  sets  such  as  depth  maps  or 
clouds  of  points.  Finally,  there  should  be  a  com¬ 
putationally  simple  correspondence  between  the 
parameters  specifying  the  surface  and  the  actual 
3-D  shape  of  the  surface,  so  that  images  of  the 
surface  can  be  easily  generated,  thereby  allow¬ 
ing  the  integration  of  information  from  multiple 
images. 

A  hexagonally  connected  mesh  of  3-D  ver¬ 
tices,  as  in  Figure  2,  is  an  example  of  a  surface 
representation  that  meets  the  criteria  stated 
above,  and  is  the  one  we  have  chosen  for  this  pa¬ 
per.  Such  a  mesh  defines  a  surface  composed  of 
three-sided  planar  polygons  that  we  call  trian¬ 
gular  facets,  or  simply  facets.  Triangular  facets 
are  particularly  easy  to  manipulate  for  image 
and  shadow  generation,  since  they  are  the  ba¬ 
sis  for  many  3-D  graphics  systems.  Hexagonal 
meshes  can  be  used  to  construct  virtually  arbi¬ 
trary  surfaces.  Finally,  standard  triangulation 
algorithms  can  be  used  to  generate  such  a  sur¬ 
face  representation  from  real  noisy  data  [18, 42]. 

3.2  Material  Properties  and  their 
Representation 

Objects  in  the  world  are  composed  of  many 
types  of  material,  and  the  material  type  can 
vary  across  the  object’s  surface  in  many  ways. 
The  key  issues,  therefore,  are  the  type  of  mate¬ 
rial  we  wish  to  consider,  and  how  its  variation 
across  the  surface  is  to  be  represented.  In  gen¬ 
eral,  one  can  represent  a  material  type  by  its  re¬ 
flectance  function,  which  maps  the  wavelength 
distribution  and  orientation  of  a  light  source, 
the  normal  to  the  surface,  and  the  viewiug  di¬ 
rection  into  an  image  color.  This  function  is 
generally  quite  complex.  However,  there  are  re¬ 
flectance  functions  that  are  not  only  much  sim¬ 
pler,  but  are  also  quite  common.  Such  functions 
are  modeled  using  only  one,  or,  at  most,  a  few. 


parameters.  Consequently,  one  can  accurately 
model  the  material  properties  of  a  surface  by 
representing  these  parameters  at  every  point  on 
the  surface. 

Probably  the  simplest,  and  most  common, 
such  function  is  the  Lambertian  reflectance 
function.  For  grey-level  images,  this  function 
not  only  has  a  single  parameter,  albedo,  which 
is  the  ratio  of  incoming  to  outgoing  light  in¬ 
tensity,  but  the  image  intensity  is  independent 
of  viewpoint.  For  this  reason,  we  have  chosen 
to  restrict  ourselves  to  Lambertian  surfaces  in 
this  paper.  However,  because  we  use  a  full  3- 
D  representation,  a  generalization  to  specular 
surfaces  would  be  fairly  strdghtforward. 

Having  chosen  a  specific  reflectance  function, 
the  remaining  issue  is  how  to  represent  the 
spatially-varying  parameter(s).  In  general,  one 
needs  to  be  able  to  represent  independent  pa¬ 
rameter  values  at  every  point  of  the  surface.  In 
terms  of  the  mesh  representation  of  the  surface, 
this  implies  some  t)rpe  of  spatial  sampling  of 
each  facet.  Given  the  finite  resolution  of  the 
images,  and  other  practical  considerations,  we 
have  chosen  to  use  two  types  of  spatial  sam¬ 
pling.  The  first  is  most  appropriate  when  the 
parameters  vary  quickly  across  the  surface,  and 
the  second  when  they  vary  more  slowly.  For 
the  former  case,  we  use  a  uniform  sampling  of 
each  facet,  where  the  inter-sample  spacing  cor¬ 
responds  roughly  to  no  more  than  one  or  two 
pixels  in  any  of  the  images.  For  the  later  case, 
we  use  a  single  value  associated  with  each  facet. 

As  we  shall  see  later,  the  two  different  repre¬ 
sentations  are  used  somewhat  differently,  and 
the  choice  of  which  representation  to  use  is 
made  on  a  facet-by-facet  basis  as  a  function  of 
the  ims^es. 

3.3  Information  Sources  for  Recon> 
struction 

There  are  a  number  of  information  sources  that 
are  available  for  the  reconstruction  of  a  surface 
and  its  material  properties.  Here,  we  consider 
two  classes  of  information. 

The  first  class  are  those  information  sources 
that  require  a  single  image,  such  as  texture  gra¬ 
dients,  shading,  and  occlusion  edges.  When  us¬ 
ing  multiple  images  and  a  full  3-D  surface  rep¬ 


resentation,  however,  we  can  do  certain  things 
that  cannot  be  done  with  a  single  image.  First, 
the  information  source  can  be  checked  for  con¬ 
sistency  across  all  images,  taking  into  account 
occlusions.  Second,  the  information  can  be  “av¬ 
eraged”  over  all  the  images,  when  the  source 
is  consistent  and  occlusions  are  taken  into  ac¬ 
count,  to  increase  its  sensitivity. 

The  second  class  are  those  information 
sources  that  require  at  least  two  images,  such 
as  the  triangulation  of  corresponding  points  be¬ 
tween  input  images  (given  camera  models  and 
their  relative  positions).  Generally  speaking, 
this  source  is  most  useful  when  corresponding 
points  can  be  easily  identified,  and  their  image 
positions  accurately  measured.  The  ease  and 
accuracy  of  this  correspondence  can  vary  sig¬ 
nificantly  from  place  to  place  in  the  image  set, 
and  depends  critically  on  the  type  of  feature 
used.  Consequently,  whatever  the  type  of  fea¬ 
ture  used,  one  must  be  able  to  identify  where  in 
the  images  that  feature  provides  reliable  corre¬ 
spondences,  and  what  accuracy  one  can  expect. 

The  image  feature  that  we  have  chosen  for 
correspondence  (though  it  is  by  no  means  the 
only  one  possible)  is  simply  intensity,  because 
the  Lambertian  reflectance  model  described  ear¬ 
lier  implies  that  the  image  intensity  of  a  surface 
pniiit  is  independent  of  the  viewing  direction. 
Therefore,  corresponding  points  should  have  the 
same  intensity  in  all  images.  Clearly,  intensity 
can  only  be  a  reliable  feature  when  the  albedo 
varies  quickly  enough  on  the  surface  (and,  con¬ 
sequently,  the  images  are  highly  textured),  and 
the  search  space  is  sufficiently  narrow.  Other¬ 
wise,  there  would  be  significant  ambiguity  in  the 
correspondence  of  pixels  across  the  images. 

In  contrast  to  our  approach  traditional 
correlation-based  stereo  methods  use  fixed-size 
windows  in  images,  which  can  only  yield  correct 
results  when  the  surface  is  tangential  to  the  im¬ 
age  plane.  Instead,  we  compare  the  intensities 
as  projected  onto  the  facets  of  the  surface,  which 
is  equivalent  to  having  variable-shaped  windows 
in  the  images.  Consequently,  if  the  original  sur¬ 
face  is  well  modeled  by  a  mesh  surface,  the  re¬ 
construction  can  be  significantly  more  accurate. 
The  Hierarchical  Warp  Stereo  System  [39]  is  an¬ 
other  example  of  a  method  that  takes  into  ac¬ 
count  the  variable  shapes  of  windows  required 


for  accurate  reconstruction  of  a  surface,  though 
it  uses  only  an  image-centered  representation  of 
the  surface. 

As  for  the  monocular  information  source,  we 
have  chosen  to  use  shading.  There  are  a  number 
of  reasons  for  this.  First,  we  are  using  a  Lam¬ 
bertian  reflectance  model,  making  shading  a  rel¬ 
atively  simple  source  of  information.  Second, 
shading  is  most  reliable  when  the  albedo  varies 
slowly  across  the  surface,  which  is  the  natural 
complement  to  intensity  correspondence,  which 
requires  quickly  varying  albedo.  The  comple¬ 
mentary  nature  of  these  two  sources  should  al¬ 
low  us  to  accurately  recover  the  surface  geom¬ 
etry  and  material  properties  for  a  wide  variety 
of  images. 

In  contrast  to  our  approach  traditional  uses 
of  shading  information  assume  that  the  albedo 
is  constant  across  the  entire  surface,  which  is  a 
major  limitation  when  applied  to  real  images. 
We  overcome  this  limitation  by  improving  upon 
a  method  to  deal  with  discontinuities  in  albedo 
alluded  to  in  the  summary  of  [30, 31].  We  com¬ 
pute  the  albedo  at  each  facet  using  the  nor¬ 
mal  to  the  facet,  a  light-source  direction,  and 
the  average  of  the  intensities  projected  onto  the 
facet  from  all  images.  Since  we  use  the  aver¬ 
age  of  the  projected  intensities,  this  computed 
albedo  minimizes  the  mean  squared  error  be¬ 
tween  the  images  of  the  mesh  surface  and  the 
input  images.  The  variation  of  this  computed 
albedo  across  the  surface  is  the  actual  informa¬ 
tion  source  used  to  recover  the  surface.  For  ex¬ 
ample,  if  the  albedo  of  the  real  surface  were 
indeed  constant,  as  in  traditional  shape-from- 
shading  problems,  then  the  measured  variation 
in  albedo  will  be  zero  for  the  correct  mesh  sur¬ 
face,  and  we  will  have  recovered  both  surface 
shape  and  albedo.  The  distinct  advantage  of 
this  approach  over  the  traditional  one  is  that  it 
can  deal  with  surfaces  whose  albedo  is  not  con¬ 
stant,  but  instead  varies  slowly  over  the  surface. 

In  the  foUowing  subsection,  we  describe  how 
these  two  sources  of  information  are  combined 
and  used  to  reconstruct  surfaces. 


3.4  Combining  and  Using  Informa¬ 
tion  Sources 

Simply  put,  our  approach  to  surface  reconstruc¬ 
tion  is  to  adjust  the  parameters  of  the  surface 
(in  the  case  of  the  mesh,  this  means  the  coor¬ 
dinates  of  the  vertices),  until  the  images  of  the 
surface  are  most  consistent  with  the  informa¬ 
tion  sources  described  above.  This  approach  re¬ 
quires  a  number  of  things.  First,  one  must  have 
an  initial  estimate  of  the  surface.  In  this  pa¬ 
per,  this  is  derived  from  a  standard  correlation- 
based  stereo  algorithm.  Second,  one  must  know 
the  light  source  direction,  camera  models,  and 
their  relative  positions  so  that  images  of  the  sur¬ 
face  can  be  generated  (we  assume  these  are  pro¬ 
vided  a  priori).  Third,  one  must  have  a  way 
of  quantifying  what  is  meant  by  ‘‘most  consis¬ 
tent  with  the  information  sources.”  Here,  we 
use  an  objective  function  that  is  a  linear  com¬ 
bination  of  components,  one  for  each  informa¬ 
tion  source,  whose  weights  are  determined  on  a 
facet-by-facet  basis  as  a  function  of  the  images. 
Finally,  one  must  have  a  computationaUy  effec¬ 
tive  means  of  finding  a  surface,  given  the  initial 
estimate,  that  is  reasonably  close  to  the  best  of 
all  possible  surfaces  according  to  the  objective 
function. 

Our  combined  objective  function  has  three 
components,  two  of  which  were  mentioned 
above:  an  intensity  correlation  component,  and 
an  albedo  variation  component.  A  third  com¬ 
ponent  is  a  measure  of  the  smoothness  of  the 
surface.  The  first  two  components  are  weighted 
differently  at  each  facet  as  a  function  of  the  im¬ 
age  intensities  projected  onto  the  facet,  while 
the  surface  smoothness  component  has  the  same 
weight  everywhere,  but  is  typically  decreased  as 
the  iterations  proceed. 

Since  the  intensity  correlation  component  de¬ 
pends  on  the  difference  in  intensity  at  a  ^ven 
point,  it  is  most  accurate  when  the  images 
are  highly  textured  in  the  areas  that  the  facet 
projects  to.  To  see  this,  consider  the  case  when 
the  images  have  constant  intensity  in  the  neigh¬ 
borhood  of  the  projected  facet:  the  difference 
in  intensity  will  be  a  constant,  independent  of 
small  variations  in  the  facet’s  position  or  ori¬ 
entation.  On  the  other  hand,  when  the  images 
are  highly  textured,  small  changes  in  the  facet 


can  significantly  change  the  value  of  this  com¬ 
ponent.  Thus,  we  weight  the  intensity  correla¬ 
tion  component  most  strongly  for  those  facets  in 
which  the  projected  image  intensities  are  highly 
textured. 

Conversely,  the  albedo  variation  component 
is  most  accurate  when  the  intensities  within  a 
facet  vary  slowly.  This  is  because  we  are  assum¬ 
ing  that  the  albedo  varies  slowly  enough  across 
the  surface  that  a  constant-albedo  facet  is  a 
good  model  for  the  surface.  Since  the  facets  are 
planar,  this  should  produce  images  whose  inten¬ 
sities  are  constant  within  the  projected  facet. 
Thus,  we  weight  the  albedo  variation  compo¬ 
nent  most  strongly  when  the  projected  intensi¬ 
ties  within  a  facet  vary  slowly. 

Since  rapidly  changing  albedoes  produce 
highly  textured  image  regions,  our  weighting 
scheme,  in  effect,  turns  off  the  shading  com¬ 
ponent  and  turns  on  the  stereo  component  in 
such  repons.  Thus,  it  provides  the  shape  from 
shading  component  with  implicit  boundary  con¬ 
ditions  at  the  edge  of  regions  of  constant  albedo. 

The  surface  smoothness  component  is  re¬ 
quired  as  a  stabilizing  term  because  neither  of 
the  above  components  is  likely  to  be  exactly  cor¬ 
rect,  the  surfaces  are  not  exactly  Lambertian, 
the  camera  positions  are  not  exactly  correct, 
there  is  noise  in  the  images,  and  so  on.  Cur¬ 
rently,  we  use  the  heuristic  technique  of  starting 
with  a  relatively  large  weight  for  the  smoothness 
component,  and  decrease  it  as  the  iterations 
proceed.  The  theoretically  optimal  point  at 
which  the  smoothness  weight  should  no  longer 
be  decreased  is  still  an  open  question,  although 
a  single,  empirically  determined,  value  has  been 
used  with  great  success  across  all  of  the  images 
presented  in  this  paper  when  using  all  of  the 
components. 

In  the  following  section,  we  describe  the  sur¬ 
face  representation  and  optimization  algorithm 
in  more  detail. 

4  Details  of  Surface  Model 
and  Optimization  Proce* 
dure 

As  discussed  in  the  previous  section,  our  ap¬ 
proach  to  recovering  surface  shape  and  re¬ 


flectance  properties  from  multiple  images  is  to 
deform  a  three-dimensional  representation  of 
the  surface  so  as  to  minimize  an  objective  func¬ 
tion.  The  free  variables  of  this  objective  func¬ 
tion  are  the  coordinates  of  the  vertices  of  the 
mesh  representing  the  surface,  and  the  process 
is  started  with  an  initial  estimate  of  the  surface. 
For  the  experiments  described  in  this  paper,  we 
have  derived  this  initial  estimate  by  triangu¬ 
lating  the  smooth  depth-map  generated  by  the 
correlation-based  stereo  algorithm  described  in 
[19,  15].  Figure  1  illustrates  the  output  of  this 
algorithm  on  a  synthetic  stereo  pair. 

Alternatively,  we  could  have  relied  on  more 
sophisticated  algorithms  that  can  triangulate 
noisy  laser  or  stereo  range-data  to  derive  our 
initial  estimates  [14,  18,  42].  AU  these  meth¬ 
ods  tend  to  smooth  the  data  and  to  interpolate 
blindly  in  the  absence  of  data  so  that  their  out¬ 
put  needs  to  be  refined  by  algorithms  such  as 
ours. 

In  this  section,  we  describe  more  formally 
each  part  of  our  approach. 

4.1  Images  and  Camera  Models 

In  this  paper,  we  assume  that  images  are 
monochrome,  and  that  their  camera  models  are 
known  o  priori.  The  set  of  grey-level  images  is 
denoted  G  =  (piiPz, .  •  Mj/n,)*  A  point  in  an 
image  is  denoted  u  =  (u,  v),  and  the  intensity 
of  point  u  in  image  p,-  is  denoted  Pi(u).  For  non¬ 
integer  values  of  u  we  use  bilinear  interpolation 
over  the  four  points  represented  by  the  floor  and 
ceiling  of  the  coordinates  of  u. 

The  projection  of  an  arbitrary  point  x  = 
(x,  y,  z)  in  space  into  image  gi  is  denoted  mi(x). 
There  are  well-known  methods  for  correcting 
both  geometric  and  radiometric  errors  in  im¬ 
ages,  as  surveyed  in  [4],  pp.  68-77.  Thus,  we 
assume  that  all  effects  of  lens  distortion  and  the 
like  have  been  taken  care  of  in  producing  the  in¬ 
put  images,  so  that  the  projection  of  a  surface 
into  an  image  is  weD  modeled  by  a  perspective 
projection.  Thus,  u  =  m,-(x)  can  be  written  as; 


X 

V 

V 

=  Mi 

y 

w 

z 

1 

Figure  1:  (a,b)  A  synthetic  stereo  pair  generated  by  texture-mapping  a  real  image  of  the  Martin-Marietta 
ALV  test-site  onto  a  Digital  Elevation  Model  (DEM),  (c)  The  disparity  map  using  a  correlation- based 
algorithm.  The  black  areas  indicate  that  the  stereo  algorithm  could  not  find  a  match.  Elsewhere,  lighter 
greys  indicate  higher  elevations,  (d)  The  same  disparity  map  after  smoothing  and  interpolation. 


u  =  U/W 

V  =  VIW, 

where  Af,-  is  a  three  by  four  projection  matrix. 

4.2  Surface  Representation 

We  represent  a  surface  5  by  a  hexagonally- 
connected  set  of  vertices  V  =  (vi,W2,...,Wn,) 
called  a  mesh.  The  position  of  vertex  wy  is  spec¬ 
ified  by  its  Cartesian  coordinates 
Figure  2  shows  such  a  mesh  as  a  wire  frame 
and  as  a  shaded  solid  surface. 

Each  vertex  in  the  interior  of  the  surface  has 
exactly  six  neighbors.  The  neighbors  of  vertex 
Vj  are  consistently  ordered  in  a  dock-wise  fash¬ 
ion.  Vertices  on  the  edge  of  a  surface  may  have 
anywhere  from  two  to  five  neighbors. 

Neighboring  vertices  are  further  organized 
into  triangular  planar  surface  elements  called 
facets,  denoted  F  =  (/i,/?, /n,).  The  ver¬ 
tices  of  a  facet  are  also  ordered  in  a  clock-wise 
fashion.  In  this  work,  we  require  that  the  initial 
estimate  W  the  surface  have  facets  whose  sides 
are  of  equal  length.  The  objective  function  de¬ 
scribed  below  tends  to  maintain  this  equality, 
but  does  not  strictly  enforce  it.  The  representa¬ 
tion  can  be  extended  in  a  straight-forward  fash¬ 
ion  to  support  different  surface  resolutions  by 
sub-dividing  facets  (which  we  have  done).  How¬ 
ever,  facets  of  a  given  resolution  will  still  be  re¬ 
quired  to  have  approximately  equal  sides. 


4.3  Objective  Function 

The  objective  function  £{S)  that  we  use  to  re¬ 
cover  the  surface  is  best  described  in  two  equa¬ 
tions.  In  the  first  equation, 

^(5)  =  Ai>f2>(5)-l-^G(5),  (1) 

f(5)  is  decomposed  into  a  linear  combination  of 
two  components.  The  first  component,  €d{S), 
is  a  measure  of  the  deformation  of  the  surface 
from  a  nominal  shape,  and  is  independent  of  the 
images.  For  this  paper,  the  nominal  shape  is  a 
plane.  Higher-order  measures,  such  as  deforma¬ 
tion  from  a  sphere,  are  also  possible.  This  nom¬ 
inal  shape  represents  the  shape  that  the  surface 
would  take  in  the  absence  of  any  information 
from  the  images. 

The  second  component, 

^g(5)  =  Xc£c{S)  -I-  XsSsiS)  (2) 

depends  on  the  images,  and  is  the  one  that 
drives  the  reconstruction  process.  It  is  further 
decomposed  into  a  linear  combination  of  the  two 
information  sources  described  in  the  previous 
section:  a  multi-image  correlation  component, 
Sci^),  and  a  component  that  depends  on  the 
shading  of  the  surface,  €siS). 

These  components,  and  their  relative  weights, 
are  described  in  jtnore  detail  below. 


Figure  2:  The  top  row  shows  s  hexagonal  mesh  as  both  a  wireframe  and  a  shaded  surface.  The  bottom  row 
shows  several  images  of  a  scene.  In  our  reproach,  these  images  are  projected  onto  the  mesh  using  camera 
models. 


4.S.1  Surface  Deformation  Component 

As  stated  earlier,  the  surface  deformation  (or 
smoothness)  component  is  a  measure  of  the  de¬ 
viation  of  the  mesh  surface  from  some  nominal 
smooth  shape.  When  the  nominal  shape  is  a 
plane,  we  can  approximate  this  as  follows. 

Consider  a  perfectly  planar  hex^onal  mesh 
for  which  the  distances  between  neighboring 
vertices  are  exactly  equal.  Recall  that  the  mesh 
is  defined  so  that  the  neighbors  of  a  vertex  v,-  are 
ordered  in  a  clock-wise  fashion,  and  are  denoted 
^NiUy  If  hexagonal  mesh  was  perfectly  pla¬ 
nar,  then  the  third  neighbor  over  from  the  j*^ 
neighbor,  Viv,'(i+3)>  would  lie  on  a  straight  line 
with  V,-  and  vjVi(i)*  Given  that  the  inter- vertex 
distances  are  equal,  this  implies  that  coordi¬ 
nates  of  V,-  equal  the  average  of  the  coordinates 


of  VM(i+3)»  for  any  j. 

Given  the  above,  we  can  write  a  measure  of 
the  deviation  of  the  mesh  from  a  plane  as  fol¬ 
lows: 

n,  3  (2x,  -  X*  -  X/k.)*+ 

^•5)  =  53  II  (2y<  -  Pfc  -  y/b*)^+  . 

»*i  ,_i  {2zi-zk-zk>f 

k=SiU) 

fc'=Arj(j+3) 

Note  that  this  term  is  also  equivalent  to 
the  squared  directional  curvature  of  the  sur¬ 
face  when  the  sides  have  approximately  equal 
lengths  [27].  Also,  this  term  can  accommo¬ 
date  multiple  resolutions  of  facets  by  normaliz¬ 
ing  each  term  by  the  nominal  inter-vertex  spac- 


ing  of  the  facets. 

4.S.2  Multi-Image  Intensity  Correlation 

The  multi-image  intensity  correlation  compo¬ 
nent  is  the  sum  of  squared  differences  in  inten¬ 
sity  from  all  the  images  at  a  ^ven  sample-pmnt 
on  a  facet,  summed  over  all  sample-points,  and 
summed  over  all  facets.  This  component  is  pre¬ 
sented  in  stages  in  the  remainder  of  this  subsec¬ 
tion. 

First,  we  define  the  sample-points  of  a  facet 
by  taking  advantage  of  the  fact  that  all  points 
on  a  triangular  facet  are  a  convex  combination 
of  its  vertices.  Thus,  we  can  write  the  sample- 
points  of  facet  fk  as: 


visible  in  that  image,  otherwise,  it  is  not.  Let 
v^(x)  =  1  when  point  x  is  determined  to  be 
visible  in  image  gi  by  the  method  above,  and 
v,-(x)  =  0  otherwise.  Then,  the  correct  form  for 
the  sum  of  squared  differences  in  intensity  at  a 
point  X  is; 

t;<(x)g.(m.(x)) 

Er«i  «.(x)  (5,(m,<x))  -  /i(x))* 

Finally,  summing  tr{x)  over  all  sample-points 
and  over  all  facets  yields  the  multi-image  inten¬ 
sity  correlation  component: 


Xkj  =  A/,1  x*,i  +  A/, 2 x*,2  +  /  =:  3,4, . .  .n^ 

where  Xk,i,  Xk,3,  and  xjt,3  are  the  coordinates  of 
the  vertices  of  facet  /*,  and  A/,i  +  A/,2  -I-  A/,3  =  1. 

In  the  top  half  of  Figure  3(a),  we  see  an  example 
of  the  sample  points  of  a  facet. 

Next,  we  develop  the  sum  of  squared  differ¬ 
ences  in  intensity  from  all  images  for  a  given 
point  X.  Recall  that  a  point  x  in  space  is  pro¬ 
jected  into  a  point  u  in  image  y,  via  the  perspec¬ 
tive  transformation  u  s  m,-(x).  Consequently, 
the  sum  of  squared  differences  in  intensity  from 
all  the  images,  o''(x),  is: 

Figure  3(a)  illustrates  the  projection  of  a 
sample-pmnt  of  a  facet  onto  several  images. 

The  above  definition  of  ^^(x)  does  not  take 
into  account  occlusions  of  the  surface.  To  do 
so,  we  use  a  “Facet-ID”  image  shown  in  Fig¬ 
ure  4.  It  is  generated  by  encoding  the  index 
t  of  each  facet  /,-  as  a  unique  color,  and  pro¬ 
jecting  the  surface  into  the  image  plane  using  a 
standard  hidden-surface  algorithm.  Thus,  when 
a  sample-point  from  facet  ft  is  projected  into 
an  image,  the  index  k  is  compared  to  the  in¬ 
dex  stored  in  the  Facet-ID  image  at  that  point. 

If  they  are  the  same,  then  the  sample-point  is 


’  =  ^Ck  ^<r(xkj), 

isl  1=3 

where  Ck  is  a  number  between  0  and  1  that 
wdghts  the  contribution  from  each  facet  differ¬ 
ently,  depending  on  the  average  degree  of  tex¬ 
turing  within  a  facet  (see  Section  4.3.4). 

When  the  ori^nal  surface  giving  rise  to  the 
images  is  sufficiently  textured,  this  component 
should  be  smallest  when  the  surface  S  closely 
iq>proximates  the  orif^al  surface.  However, 
when  the  surface  has  constant,  or  nearly  con¬ 
stant,  albedo  this  component  would  be  small 
for  many  different  surfaces.  As  an  extreme  ex¬ 
ample  of  this  ambiguity,  consider  a  planar  sur¬ 
face  with  constant  albedo.  This  produces  im¬ 
ages  with  constant  intensity.  Thus,  this  compo¬ 
nent  will  not  be  able  to  constrain  the  shape  of 
the  surface,  since  the  difference  in  intensity  will 
be  zero  for  all  surfaces. 

An  example  of  using  only  the  intensity- 
correlation  and  smoothness  components  on  the 
synthetic  stereo  pair  of  Figure  1  is  shown  in  Fig¬ 
ure  5.  The  top  row  of  the  figure  depicts  the 
initial  surface  estimate.  Figures  5(a)  and  (b) 
are  shaded  images  of  the  mesh.  Figure  5(c)  de¬ 
picts  the  error  from  ground-truth  elevation  for 
the  left  image,  where  black  indicates  zero  error, 
and  white  indicates  an  error  corresponding  to  a 
few  pixels  in  disparity.  Figure  5(d)  depicts  the 
squared  difference  in  intensity  between  the  left 
image  and  the  right  images  warped  using  the 
disparity  map.  Note  that  the  worst  errors  occur 


Figure  3:  (e)  Facets  are  sampled  at  regular  intervals  as  illustrated  here.  We  use  the  grey  levds  of  the 
projectioiis  of  these  sample  pmnts  to  compute  the  stereo  score,  (b)  The  albedo  ci  each  facet  is  estimated 

using  the  facet  nwmal  the  light  source  direction  L  and  the  average  grey  levd  of  the  projection  of  the 
facet  into  the  images. 


ilgure  4:  Illustration  oi  the  projection  cS  a  mesh,  and  the  "Facet-ID”  image  used  to  accomodate  occlunons 
during  surface  reccmstruction.  (a)  A  diaded  image  of  a  mesh,  (b)  A  wire-frame  representation  of  the  mesh 
(bold  white  lines)  and  the  sample-p<wts  in  each  facet  (interior  white  points),  (c)  The  "Facet-ID”  image, 
wherein  the  odor  at  a  pixel  is  chosen  to  uniquely  identify  the  visible  facet  at  that  point  (shown  here  as  a 
grey-level  image). 


along  the  steep  ridge  of  the  terrain,  where  the 
constant-depth  assumption  of  correlation-based 
stereo  is  most  strongly  violated. 


The  bottom  row  of  Figure  5  illustrates  the  re¬ 
sult  of  the  optimisation  procedure,  described  in 
Section  4.4,  using  only  the  intensity-correlation 
and  smoothness  components.  Note  that  the 
overall  error  in  both  elevation  and  intensity  is 
lower,  and  that  the  error  is  no  longer  concen¬ 
trated  along  the  ridge.  As  a  result,  the  ridge  is 
clearly  sharper  in  the  shaded  views. 


4.3.3  Shading 

The  shading  component  of  the  objective  func¬ 
tion  is  the  sum,  over  all  facets,  of  the  difference 
between  the  computed  albedo  of  the  facet  and 
the  computed  albedoes  of  all  of  its  neighbors. 
The  motivation  for  this  component,  and  its  pre¬ 
cise  form,  foUow. 

Recall  that  the  Lambertian  reflectance  model 
defines  the  intensity  p  at  a  point  on  a  surface 

with  a  unit  surface  normal  as: 

g  =  aia  +  b7^  -t), 


(3) 


Figure  5:  (a,b)  Two  shaded  views  of  the  mesh  derived  from  the  smoothed  disparity  map  of  Figure  1(d).  (c) 
Deviations  in  altitude  from  the  elevation  data  used  to  generate  the  synthetic  pair,  (d)  Intensity  error  image, 
created  hy  warping  the  right  image  into  the  left  image  using  the  disparities  corresponding  to  the  elevations 
of  the  mesh  facets  and  computing  the  squared  difference  between  these  two  images  (e,f,g,h)  Corresponding 
images  after  stereo  optimisation.  Note  that  the  ridge  now  appears  much  sharper  in  the  shaded  views,  and 
that  the  overall  error  is  smaller  and  more  evenly  distributed. 


where  a  is  the  albedo  of  the  surface,  o  is  the 
magnitude  of  the  ambient  light,  b  is  the  mag¬ 
nitude  of  a  point  light  source,  and  L  is  the 
direction  of  the  point  light  source  as  depicted 
in  Figure  3(b). 

Note  that  g  is  independent  of  the  viewing  di¬ 
rection.  Consequently,  if  we  were  to  image  a 
planar  Lambertian  facet  from  several  points  of 
view,  its  intensity  would  be  the  same  for  all  pix¬ 
els  in  the  projection  of  the  facet.  Conversely,  if 
we  were  to  measure  the  average  intensity  gu  of 
all  of  the  pixels  within  the  projection  of  a  facet 
/fc,  we  could  compute  its  albedo,  ak,  as  follows: 

“t  = - ^-nr-  H) 

{a-\-bN  •  L) 

This  assumes,  of  course,  that  the  facet  is  well- 
modeled  by  a  single  albedo,  and  that  the  vari¬ 
ation  in  intensity  is  due  only  to  noise.  In  this 
paper,  we  assume  that  the  ambient  and  direct 

illumination  (t.e.,  a,  b,  and  L  )  are  pven,  al¬ 


though  some  of  these  parameters  could  be  in¬ 
cluded  in  the  optimization,  as  was  done  in  [31]. 

The  average  intensity  gk  of  a  facet  is  com¬ 
puted  by  scanning  over  all  the  Facet-ID  images 
for  index  k,  and  taking  the  average  of  the  inten¬ 
sities  at  matching  points  in  the  corresponding 
images.  This  method  provides  an  inexpensive 
way  of  computing  the  average  intensity  while 
taking  occlusions  into  account. 

Now,  if  the  original  surface  had  exactly  con¬ 
stant  albedo,  and  if  our  mesh  surface  were 
a  good  approximation  to  the  original  surface, 
then  the  computed  albedoes  should  be  approx¬ 
imately  the  same  across  all  facets.  Thus,  some 
measure  of  the  variation  in  computed  albedoes 
would  be  a  good  measure  of  the  correctness  of 
the  mesh  surface.  If  the  albedo  varies  slowly 
across  the  surface,  we  propose  that  an  appro¬ 
priate  measure  of  this  variation  is  the  difference 
between  the  computed  albedo  at  the  facet  and 
the  computed  albedoes  of  all  of  its  neighbors: 


*-l  ,€N/(*) 

wImtc  Nf{k)  is  the  set  of  indices  of  the  facets 
that  are  neighbors  of  facet  /s,  and  es  and  Cj  are 
numbers  between  0  and  1  that  depend  on  the 
degree  of  texturing  within  facets  fk  and  fj. 

An  example  of  using  only  the  shading  and 
smoothness  components  is  illustrated  in  Fig. 
nre  6.  Figure  6(a)  shows  a  shaded  view  of 
the  original  surface,  a  hemisphere  with  constant 
albedo.  Figures  6(b)  and  (c)  show  shaded  views 
of  the  initial  surface  estimate,  which  was  de¬ 
rived  by  adding  white  noise  to  the  vertex  co- 
ordinates  of  the  original  surface.  Figures  6(d) 
and  (e)  are  the  shaded  views  of  the  result  af. 
ter  optimization,  and  Figure  6(f)  is  the  albedo 
m^  for  the  surface,  i.e.  the  intensity  in  the  im¬ 
age  represents  the  albedo  of  the  surface.  Note 
that  the  albedo  and  shape  are  wdl  recovered  ex- 
cept  near  the  edge  of  the  hemisphere  where  the 
image  intensity  varies  rapidly  across  the  image. 
This  is  because  the  approximation  we  use  in  the 
derivatives  of  this  component  is  that  the  mean 
intensity  within  a  facet  does  not  vary  signifi¬ 
cantly  in  the  ndghborhood  of  a  facet,  which  is 
vkdated  for  facets  that  straddle  the  boundary. 
This  does  not  hurt  us  when  combining  shading 
with  the  stereo  component  since,  as  explain  in 
the  foUowing  subsection,  we  turn  off  the  shading 
component  in  such  areas. 

4.S.4  Combining  the  Components 

Recall  that  the  objective  function  f  (5)  is  a  lin¬ 
ear  combination  of  three  components: 


Thus,  one  needs  to  specify  both  the  As,  defining 
the  relative  weights  of  the  components,  and  the 
cifS,  defining  the  relative  weights  of  the  facets  in 
each  of  these  components. 

The  A  weights  are  defined  as  follows: 


Ajj 


Ac 


_ ik_ 

II  V^D(^)  II 


II  II 

ii^«5(.s")ir 


(S) 


where  is  the  initial  estimate  of  the  surface, 
and  the  A's  are  user  defined  weights.  Normal¬ 
izing  each  component  by  the  magnitude  of  its 
initial  gradient  allows  the  components  to  have 
roughly  the  same  influence  when  the  A's  are 
equal.  Thus,  the  user  can  more  easily  specify 
the  relative  contributions  of  each  component  in 
an  image-independent  fashion.  This  normaliza¬ 
tion  scheme  was  used  with  great  success  in  [16], 
and  is  analogous  to  standard  constrained  op¬ 
timization  techniques  in  which  the  various  con¬ 
straints  are  scaled  so  that  ihm  eigenvalues  have 
comparable  magnitudes  [34]. 

As  mentioned  earlier,  the  ck  weights  are  a 
function  of  the  d^ree  of  texturing  in  the  in¬ 
tensities  projected  within  a  facet  fk.  A  sim¬ 
ple  measure  of  the  degree  of  texturing  within  a 
facet  is  the  variance  in  intensity  of  all  the  pixels 
projecting  onto  the  facet,  denoted  Ofc(^)  (us¬ 
ing  the  Facet-ID  image  to  accommodate  occlu¬ 
sions).  We  have  found  that  using  the  logarithm 
of  <rfc(5)  yields  the  most  stable  results: 


eiS)  =  \D€DiS)-\-Xc€c{S)  +  Xs€siS), 

where  the  last  two  components  are  themselves 
linear  combinations  of  subcomponents  com¬ 
puted  on  a  per-facet  basis: 

^c(5)  =  J]c*  53<r(x*,j)  (5) 

kssl  Iss3 

w)  =  i;(i-c*)  5;  (1  -  c,)(o*  -  a,)». 

*-i  ieNfik) 


cfc  =  olog(H-<Tfc(5))-|-6,  (7) 

where  o  and  b  are  normalizing  factors  chosen  so 
that  the  smallest  Ck  is  zero,  and  the  largest  is 
one. 

4.4  The  Optimization  Procedure 

The  purpose  of  the  optimization  procedure  is  to 
iteratively  modify  the  surface  5  so  as  to  mini¬ 
mize  €{S),  given  some  initial  estimate  iS°,  and 
some  value  for  the  weights  A5,  A^,  and  A^ 


Figure  6:  (a)  Shaded  image  (tf  a  honisphere  of  contant  albedo.  (b,c)  Shaded  views  of  randomised  hemisphoe 
used  as  a  starting  point.  (d,e)  Shaded  views  of  the  same  hemisphere  after  optimisation  using  only  the  shading 
component  of  the  objective  fimction.  (f)  The  recovered  albedo  map. 


(where  A5  +  +  Ap  =  1)  defined  in  Equa¬ 

tion  7.  Ideally,  one  would  like  to  use  as  small  a 
value  of  the  deformation  weight  Aj^  as  possible 
so  as  to  minimize  the  bias  introduced  by  this 
term.  However,  in  practice,  A^  serves  a  dual 
purpose.  First,  race  the  surface  deformation 
term  is  a  quadratic  function  of  the  vertex  co¬ 
ordinates,  it  "convexifies”  the  energy  landscape 
and  improves  the  convergence  properties  of  the 
optimization  procedure.  Second,  as  will  be  dis¬ 
cussed  in  the  results  section,  in  the  absence  of 
a  smoothing  term,  the  objective  function  may 
overfit  the  data  and  wrinkle  the  surface  exces¬ 
sively.  Furthermore,  the  Cfc  wmghts  of  FSquations 
6  and  7  are  computed  for  the  initial  position  of 
the  mesh  and  are  only  meaningful  when  it  is 
rdatively  close  to  the  actual  surface. 

Consequently,  we  use  an  optimization  method 
that  is  inspired  by  the  heuristic  technique 
known  as  a  continuation  method  [43, 28, 29, 30]. 
We  first  "turn  ofP  the  shading  term  by  setting 
A5  (equation  7)  to  0  and  set  A^  to  a  value  that 
is  large  enough  to  sufficiently  convexify  the  en¬ 
ergy  landscape  but  small  enough  to  allow  cur¬ 


vature  in  the  surface.  In  this  paper  we  take  the 
initial  value  of  A^^  to  be  0.5.  Given  the  initial 
estimate  5°,  a  local  minimum  of  this  approxi¬ 
mate  objective  function  is  found  using  a  stan¬ 
dard  optimization  procedure.  Then,  X'p  is  de¬ 
creased  slightly,  and  the  optimization  procedure 
is  applied  again,  starting  at  the  local  minimum 
found  for  the  previous  approximation.  This  cy¬ 
cle  is  repeated  until  A^  is  decreased  to  the  de¬ 
sired  value.  Finally  we  "turn  on”  the  shading 
term,  compute  the  Ck  weights  and  reoptimize. 
In  all  examples  shown  in  the  result  section  we 
use  =  A5  =  .4  and  A^  =  .2. 

The  stereo  component  effectively  uses  only 
first  order  information  about  the  surface  (i.e., 
the  position  of  the  vertices),  whereas  shading 
uses  second  order  information  about  the  sur¬ 
face  (i.e.,  its  surface  normals).  Thus,  by  op¬ 
timizing  the  stereo  component  first,  we  effec¬ 
tively  compute  the  zero  order  properties  of  the 
surface  and  set  up  boundary  conditions  that  the 
shading  component  can  then  use  to  compute  the 
first  order  properties  of  the  surface  in  texture¬ 
less  regions.  In  section  5,  we  will  show  that  this 


leads  to  a  significant  improvement  over  using 
the  stereo  component  alone. 

When  dealing  with  surfaces  for  which  motion 
in  one  direction  leads  to  more  dramatic  changes 
that  motions  in  others,  as  is  typically  the  case 
with  the  z  direction  in  Digital  Elevation  Mod- 
ds  (DEMs),  we  have  found  that  the  following 
heuristic  to  be  useful.  We  first  fix  the  x  and  y 
coordinates  of  vertices  and  adjust  z  alone.  Once 
the  surface  has  been  optimized,  we  the  allow  all 
of  the  coordinates  to  vary  simultaneously. 

The  optimization  procedure  we  use  at  ev¬ 
ery  stage  is  a  standard  conjugate-gradient  de¬ 
scent  procedure  called  FRPRMN  (from  [38])  in 
conjunction  with  the  a  simple  line  search  al¬ 
gorithm.  The  conjugate-gradient  procedure  re¬ 
quires  three  inputs:  1)  a  function  that  returns 
the  value  of  the  objective  function  for  any  S',  2) 
a  function  that  returns  the  gradient  of  €{S), 
i.e.,  a  vector  whose  elements  are  the  partial 
derivatives  of  with  respect  to  the  vertex 
coordinates,  evaluated  at  S',  and  3)  an  initial 
estimate  5°. 

The  gradient  of  £(5)  is  conceptually  straight¬ 
forward,  but  is  fairly  complicated  to  derive  man¬ 
ually.  We  have  used  the  Maple  ^  mathematical 
package  to  derive  some  of  the  terms.  We  sum¬ 
marize  the  calculation  of  the  derivatives  below 
in  general  terms. 

The  derivatives  of  the  stereo  term  are  lin¬ 
ear  combinations  of  image  intensity  derivatives 
and  of  derivatives  of  the  3-D  projections  of 
points  onto  the  images.  Since  we  use  bilinear- 
interpolation  of  image  values,  the  first  deriva¬ 
tives  of  image  intensity  are  linear  combinations 
of  the  imi^e  intensities  in  the  immediate  neigh¬ 
borhood  of  the  projection.  Since  sample-points 
are  linear  combinations  in  projective  space  of 
the  mesh  vertices,  their  projections  are  raticra 
of  linear  combinations  of  the  projections  of 
the  vertices,  which  themselves  depend  linearly 
on  the  vertex  coordinates.  (Consequently,  the 
derivatives  of  these  projections  are  ratios  of  lin¬ 
ear  combinations  of  the  vertex  coordinates  and 
squares  of  linear  combinations  of  the  vertex  co¬ 
ordinates. 

Similarly,  the  derivatives  of  the  shading  term 
depend  of  the  derivatives  of  the  surface  nor- 

’l^ademark,  Waterloo  Maple  Software 


mal,  which  can  be  easily  derived  analytically, 
and  from  the  derivative  of  the  mean  grey-level 
in  the  facets.  In  this  work,  the  shading  term  is 
used  mainly  in  the  fairly  uniform  areas  where 
the  latter  derivative  is  assumed  to  be  small  and 
therefore  neglected. 

5  Behavior  of  the  Objective 
Function  and  Results 

In  previous  sections,  we  have  shown  results  of 
the  optimization  procedure  using  only  one  or 
the  other  of  the  image  components  of  the  objec¬ 
tive  function.  In  this  section,  we  first  illustrate 
the  behavior  of  the  complete  objective  function 
using  synthetic  data.  We  then  show  that  the 
same  behavior  can  be  observed  with  real  data, 
allowing  ns  to  generate  accurate  3-D  reconstruc¬ 
tions  of  real  surfaces  from  multiple  images. 

5.1  Synthetic  Data 

To  demonstrate  the  properties  of  the  objective 
function  of  Equation  1  and  the  infiuence  of  the 
coefficients  defined  in  Equation  4,  we  use  as  in¬ 
put  the  five  synthetic  images  of  a  shaded  hemi¬ 
sphere  with  variable  albedo  shown  at  the  bot¬ 
tom  of  Figure  7,  both  with  and  without  the  ad¬ 
dition  of  white  noise.  Each  column  of  the  figure 
illustrates  the  steps  used  in  the  creation  of  the 
image  at  the  bottom  of  the  column.  We  be¬ 
gin  with  a  mesh  and  an  albedo  map,  shown  in 
the  top  row.  Then,  for  each  view,  two  images 
are  produced.  The  first  image  (second  row  of 
the  figure)  is  the  albedo  map  texture-mapped 
onto  the  mesh  from  the  final  image’s  point  of 
view.  The  second  image  (third  row  of  the  fig¬ 
ure)  is  a  shaded  view  of  the  mesh,  using  a  con- 
stamt  albedo  equal  to  one.  The  final  ima^e  is  the 
point-by-point  product  of  these  two  images  be¬ 
cause,  by  Equation  3,  the  imaged  intensity  of  a 
Lambertian  surface  is  the  product  of  the  albedo 
(first  image)  and  the  inner  product  of  the  light 
source  and  the  surface  normal  (second  image). 

Figure  8  depicts  graphically  the  result  of  our 
experiments.  In  each  experiment  we  random¬ 
ized  the  mesh  by  adding  random  numbers  to 
the  coordinates  of  the  mesh  vertices,  and  added 
different  amounts  of  noise  to  the  input  images. 


Figure  7:  The  mmking  of  synthetic  images  of  a  shaded  hemisphere  with  variable  albedo  that  conforms  to 
our  Lambertian  model. 


We  then  used  our  optimization  procedure  to  es¬ 
timate  the  true  hemispherical  shape  and  true 
albedo  map.  More  precisely,  starting  firom  our 
randomized  initial  estimate,  we  first  use  stereo 
alone  and  progressively  decrease  the  value  of 
the  Xj)  parameter  of  Equation  7  from  0.5  to 
0.  We  then  turn  on  the  shading  term  by  set¬ 
ting  both  Ap  and  A^  to  0.4,  compute  the  c^s 
of  Equation  7  and  optimize  the  full  objective 
function.  To  show  the  stability  of  the  process, 
we  then  recompute  the  cj^s  for  the  optimized 
mesh  and  perform  a  second  optimization  using 
the  updated  values. 

The  first  column  of  Figure  8  is  for  experi¬ 
ments  using  only  the  first,  second,  and  third  im¬ 
ages  from  Figure  7,  where  there  is  little  self  oc¬ 
clusion.  The  second  column  is  for  experiments 
using  the  first,  fourth,  and  fifth  images,  where 


there  is  a  significant  amount  of  self  occlusion. 
Finally,  the  third  column  is  for  experiments  us¬ 
ing  all  five  images.  In  this  particular  set  of  ex¬ 
periments,  we  fixed  the  boundaries  of  the  mesh 
and  allowed  only  the  z  coordinates  of  the  ver¬ 
tices  to  vary.  However,  the  same  overall  be¬ 
haviors  can  be  observed  without  the  boundary 
conditions. 

The  first  row  from  the  top  of  Figure  8  is 
a  graph  of  the  average  squared  error  in  eleva¬ 
tion  (the  abscissa)  versus  decreasing  A^  (the 
ordinate).  To  the  left  of  the  dotted  vertical 
line,  only  the  intensity  correlation  component  is 
used.  To  the  right,  both  the  intensity  correla¬ 
tion  and  shading  components  are  used.  The  dif¬ 
ferent  curves  are  for  different  amounts  of  noise 
in  the  input  images.  The  bottom  curve  is  when 
there  is  no  noise  (other  than  quantization  error), 


Figure  8:  Graphs  of  the  errors  and  objective  function  components  while  fitting  a  surface  model  to  the 
qmthetic  shaded  hemisphere  images  of  Figure  7  (These  graphs  are  explained  in  detail  in  the  text.).  (a,b,c) 
Average  error  in  recovered  elevation.  (d,e,f)  Average  error  in  recovered  albedo.  (g,h,i)  Stereo  component  of 
the  energy.  (j,k,l)  Shading  component  of  the  energy. 


the  middle  curve  is  for  a  noise  variance  of  4% 
of  the  image  dynamic  range,  and  the  top  curve 
is  for  a  noise  variance  of  8%.  The  short  verti¬ 
cal  lines  along  the  curves  indicated  the  standard 
deviation  of  the  average  error  over  the  20  exper¬ 
iments  performed  to  derive  each  curve. 

The  second  row  of  Figure  8  is  a  graph  of  the 
average  error  in  computed  albedo.  The  third 
row  is  the  average  value  of  the  intensity  corre¬ 
lation  component,  Sc(S),  and  the  fourth  row 
is  the  average  value  of  the  shading  component, 


Note  that,  as  decreases  and  stereo  alone 
is  used  (t.e.,  as  the  ordinate  is  traversed  right¬ 
wards  to  the  dotted  vertical  line),  the  average 
elevation  error  decreases  when  there  is  no  noise 
in  the  input  image  (bottom  curve),  as  does 
the  average  albedo  error  and  the  two  compo¬ 
nents  of  the  objective  function.  However,  when 
the  images  are  noisy,  the  elevation  error  (first 
row)  stops  decreasing  and  may  even  begin  to 
increase  as  we  start  fitting  to  the  grey-level 


noise,  even  though  the  value  of  the  intensity 
correlation  component  (third  row)  continues  to 
decrease  (as  it  must).  Furthermore,  both  the 
albedo  error  (second  row)  and  the  shading  com¬ 
ponent  (fourth  row)  also  begin  to  increase  when 
the  elevation  error  does.  This  is  natural  since 
for  smaller  values  of  Xp  the  surface  becomes 
rougher  and  its  normals  less  well-behaved.  As 
a  result,  the  estimated  albedoes  of  Equation  4 
become  less  reliable  and  noisier. 

In  other  words,  an  increase  in  the  shading 
component  provides  us  with  a  warning  that  we 
are  starting  to  overiit  the  data.  This  is  a  valu¬ 
able  behavior  in  itself.  Furthermore,  by  turning 
on  the  shading  component  of  our  objective  func¬ 
tion  (those  parts  of  the  graphs  that  are  to  the 
right  of  the  vertical  dotted  line),  we  can  bring 
down  both  the  error  in  albedo  and  the  value 
of  albedo  component  with  at  worst  of  modest 
increase  in  the  value  of  the  stereo  component, 
resulting  in  an  overall  reduction  of  the  elevation 
error.  Even  when  there  is  nothing  but  quanti¬ 
zation  noise  in  the  image,  the  addition  of  the 
shading  component  can  make  a  small,  but  still 
noticeable  difference.  The  reasons  for  this  are 
twofold: 

1.  The  shading  component  averages  over 
whole  facets  and  is  therefore  less  sensitive 
to  uncorrelated  noise. 

2.  The  shading  component  uses  absolute  in¬ 
tensity  values  whereas  the  stereo  compo¬ 
nent  uses  intensity  differences.  Thus,  in  the 
presence  of  noise  in  textureless  areas,  the 
signal-to-noise  ratio  for  the  absolute  values 
(used  by  the  shading  component)  is  larger 
than  for  the  differences  (used  by  the  stereo 
component),  thereby  making  the  shading 
term  more  robust. 

However,  in  our  experience,  the  shading  term 
can  only  be  used  reliably  when  the  surface  is  rd- 
atively  close  to  the  correct  answer.  This  is  not 
surprising  since  the  stereo  deals  directly  with  el¬ 
evations  whereas  shading  deals  with  derivatives 
of  devation.  Consequently  we  have  chosen  the 
optimization  "schedule”  described  above  where 
we  first  optimize  using  stereo  alone  and  turn  on 
shading  only  later. 


There  is  another  important  point  to  note 
about  these  results.  The  elevation  errors  in  the 
second  row,  i.e  those  generated  using  images  1, 
4,  and  5  with  a  lot  of  self  occlusion  are  very  close 
to  those  of  the  first  row,  i.e.  those  generated  us¬ 
ing  images  1,  2,  and  3  with  little  self  ocdusion, 
while  those  in  the  final  row  (using  all  five  im¬ 
ages)  are  significantly  better.  Furthermore,  in 
this  particular  case,  the  results  for  images  1,4 
and  5  are  even  slightly  better  than  those  for 
images  1,2  and  3  in  the  presence  of  noise  be¬ 
cause  the  former  correspond  to  larger  baselines. 
In  c^her  words,  having  the  same  number  of  im¬ 
ages,  but  with  significant  self-ocdusions,  does 
not  hurt  our  procedure.  However,  adding  new 
images  that  contain  significant  self-ocdusions 
actually  improves  the  results. 

We  now  turn  to  real  images  and  show  that 
the  same  properties  can  also  be  observed  there. 

5.2  Real  Images 

In  Figure  9  we  show  the  result  of  running  the 
stereo  component  of  our  objective  function  on  a 
real  stereo  pair  corresponding  to  the  same  site 
as  the  synthetic  images  of  Figure  1.  Note  that 
the  radiometry  of  the  left  and  right  images  are 
actually  slightly  different.  We  correct  for  this 
by  first  band-passing  each  image  by  taking  the 
difference  between  the  image  and  its  gaussian 
convolution.  This  is  approximately  equivalent 
to  repladng  the  simple  correlation  that  our  ob¬ 
jective  function  uses  by  a  normalized  correla¬ 
tion,  but  is  computationally  more  efficient.  We 
then  applied  the  optimization  using  exactly  the 
same  schedule  and  parameters  as  in  the  syn¬ 
thetic  case,  with  the  exception  that  Xs  is  not 
reduced  quite  as  much  for  the  real  images  as 
for  the  synthetic  ones  in  the  first  step  of  the 
procedure.  Note  that  the  recovered  ridge  is 
even  sharper  than  in  the  synthetic  case.  This 
is  because  the  Digital  Elevation  Model  used  to 
produce  the  synthetic  right  image  was  actually 
a  slightly  smoothed  version  of  the  terrain,  in 
which  one  side  of  the  ridge  is  an  almost  verti¬ 
cal  cliff.  Thus,  even  though  we  do  not  currently 
have  ground  truth  for  the  real  case,  the  sharp¬ 
ness  of  the  recovered  cliff,  which  matches  what 
is  seen  using  a  stereoscope,  leads  us  to  believe 
that  the  algorithm  has  performed  weU. 


Figure  9:  (a,b)  A  stereo  pair  of  real  images  of  the  Martin-Marietta  ALV  test-site  used  in  Figure  1.  (c) 
Intensity  error  image  computed  using  the  method  described  in  Figure  1(c)  (d,e)  Shaded  views  of  the  mesh 
after  optimization,  (f)  Intensity  error  image  after  optimization.  Note  that  the  ridge  is  now  very  sharp.  This 
corresponds  accurately  to  the  almost  vertical  cliff  that  can  be  seen  when  viewing  the  stereo  pair  with  a 
stereoscope. 


In  Figure  10  we  show  three  triplets  of  images 
of  faces.  They  have  been  produced  using  the  IN- 
RIA  three  camera  system  [13]  that  provides  us 
with  the  3  by  4  projection  matrices  we  need  to 
perform  our  computations.  In  this  case  it  is  es¬ 
sential  to  have  more  than  two  images  to  be  able 
to  reconstruct  both  sides  of  the  face  because  of 
self-occlusions.  For  each  triplet,  we  have  com¬ 
puted  disparity  maps  corresponding  to  images  1 
and  2  and  to  images  1  and  3  and  combined  them 
to  produce  the  depth  maps  shown  in  the  right¬ 
most  column  of  the  figure  using  the  algorithms 
described  in  [19,  15). 

The  depth  maps  have  then  been  smoothed 
and  triangulated  to  produce  the  initial  surfaces 
shown  in  the  upper  left  corner  of  Figures  11, 
12,  and  13.  In  the  first  row  of  these  three  fig¬ 
ures,  we  show  the  result  of  the  optimization 
using  stereo  alone  as  we  progressively  decrease 
the  smoothness  constraint  and  allow  all  three 
vertex  coordinates  to  be  adjusted.  Note  that 
for  the  first  two  triplets  (Figures  11  and  12), 
we  recover  more  and  more  detail  until  the  sur¬ 


face  eventually  starts  to  wrinkle,  without  appar¬ 
ent  improvement  in  accuracy.  The  third  triplet 
poses  an  even  more  difficult  problem:  there  are 
strong  specularities  on  both  the  forehead  and 
the  nose  that  strongly  violate  our  Lambertian 
model.  Because  there  are  very  few  other  points 
that  can  be  matched  on  the  nose,  the  algorithm 
latches  on  to  these  specularities  and  yields  a 
poor  result. 

In  the  bottom  row  of  Figures  11,  12,  and 
13,  we  show  our  final  results  obtained  by  turn¬ 
ing  on  the  shading  term  and  reoptimizing  the 
meshes.  For  these  images  we  did  not  know  a- 
priori  the  light  source-direction,  we  therefore  es¬ 
timated  it  by  choosing  the  direction  that  min¬ 
imizes  the  shading  component  of  the  objective 
function  given  the  surface  optimized  using  only 
the  stereo  component.  In  all  three  images,  the 
main  features  of  the  faces,  nose,  mouth  and 
eyes  have  been  correctly  recovered.  The  im¬ 
provement  is  particularly  striking  in  the  case  of 
the  face  in  Figure  13.  The  shading  component 
was  able  to  achieve  this  result  because  it  uses 


Figure  10:  TViplets  of  face  images  and  corresponding  disparity  maps  (courtesy  of  INRIA). 


the  monocular  information  around  the  specular-  has  been  outweighed  by  the  surrounding  infor- 
ities.  The  stereo  component  cannot  take  advan-  mation.  A  more  principled  approach  to  solving 

tage  of  the  information  around  the  specularities  this  problem  would  be  to  explicitly  include  a 

because  very  few  points  are  visible  in  at  least  specularity  term  in  our  shading  model, 
two  images  simultaneously,  and  because  there  is  The  graphs  of  Figure  14  depict  the  behav- 
little  texture.  Of  course,  the  effect  of  the  spec-  ior  of  the  stereo  and  shading  components  of  the 
ularities  has  not  completely  disappeared  (there  objective  function  for  the  three  triplets.  The 

is  indeed  still  a  small  artifact  on  the  nose)  but  four  values  of  the  scores  to  the  left  of  the  thick 


Figure  11:  Results  for  the  first  triplet  of  Figure  10.  (a)  Shaded  view  of  the  mesh  generated  by  smoothing  and 
triangulating  the  computed  disparity  map.  We  use  it  as  the  starting  condition  for  our  optimization  procedure. 
(b,c,d)  The  mesh  after  optimization  using  only  the  stereo  term,  with  progressively  less  smoothing.  (e,f,g) 
Several  views  of  the  mesh  after  optimization  using  both  stereo  and  shading,  (h)  The  recovered  albedo  map. 


dotted  line,  Sto  to  Stz,  correspond  to  the  re¬ 
sults  shown  in  the  top  row  of  Figures  11,  12, 
and  13.  The  fifth  value,  St  -I-  5h,  corresponds 
to  the  final  results  when  shading  is  turned  on. 
These  values  have  been  scaled  so  that  Sto  is 
equal  to  one  for  all  triplets.  As  in  the  synthetic 
case,  when  using  stereo  alone,  the  stereo  com¬ 
ponent  always  improves,  but  as  the  recovered 
surface  becomes  rougher  the  shading  term  de¬ 
grades  dramatically.  However,  when  we  turn  on 
the  shading  component,  the  overall  results  im¬ 
prove  significantly,  even  though  the  stereo  com¬ 
ponent  degrades  slightly. 

6  Summary  and  Conclusion 

In  this  paper  we  have  presented  a  surface  recon¬ 
struction  method  that  uses  an  object-centered 
representation  (a  triangulated  mesh)  to  recover 
geometry  and  reflectance  properties  from  mul¬ 
tiple  images.  It  aUows  us  to  handle  self¬ 


occlusions  while  merging  information  from  sev¬ 
eral  viewpoints,  thereby  allowing  us  to  elimi¬ 
nate  blindspots  and  making  the  reconstruction 
more  robust  where  more  than  one  view  is  avail¬ 
able.  The  reconstruction  process  relies  on  both 
monocular  shading  cues  and  stereoscopic  cues. 
We  use  these  cues  to  drive  an  optimization  pro¬ 
cedure  that  takes  advantage  of  their  respective 
strengths  while  eliminating  some  of  their  weak¬ 
nesses. 

Specifically,  stereo  information  is  very  ro¬ 
bust  in  textured  regions  but  potentially  unre¬ 
liable  elsewhere.  We  therefore  use  it  mainly  in 
such  areas  by  weighting  the  stereo  component 
most  strongly  for  facets  of  the  triangulation  that 
project  into  textured  image  areas.  The  compo¬ 
nent  compares  the  grey-levels  of  the  points  in 
aU  of  the  images  for  which  the  projection  of  a 
given  point  on  the  surface  is  visible,  as  deter¬ 
mined  using  a  hidden-surface  algorithm.  This 
comparison  is  done  for  a  uniform  sampling  of 


Figure  11:  Results  for  the  first  triplet  of  Figure  10.  (a)  Shaded  view  of  the  mesh  generated  by  smoothing  and 
triangulating  the  computed  disparity  map.  We  use  it  as  the  starting  condition  for  our  optimization  procedure. 
(b,c,d)  The  mesh  after  optimization  using  only  the  stereo  term,  with  progressively  less  smoothing.  (e,f,g) 
Several  views  of  the  mesh  after  optimization  -^sing  both  stereo  and  shading,  (b)  The  recovered  albedo  map. 


dotted  line,  Sto  to  St^,  correspond  to  the  re¬ 
sults  shown  in  the  top  row  of  Figures  11,  12, 
and  13.  The  fifth  value,  St  -I-  Sh,  corresponds 
to  the  final  results  when  shading  is  turned  on. 
These  values  have  been  scaled  so  that  Sto  is 
equal  to  one  for  all  triplets.  As  in  the  synthetic 
case,  when  using  stereo  alone,  the  stereo  com¬ 
ponent  always  improves,  but  as  the  recovered 
surface  becomes  rougher  the  shading  term  de¬ 
grades  dramatically.  However,  when  we  turn  on 
the  shading  component,  the  overall  results  im¬ 
prove  significantly,  even  though  the  stereo  com¬ 
ponent  degrades  slightly. 

6  Summary  and  Conclusion 

In  this  paper  we  have  presented  a  surface  recon¬ 
struction  method  that  uses  an  object-centered 
representation  (a  triangulated  mesh)  to  recover 
geometry  and  reflectance  properties  from  mul¬ 
tiple  images.  It  allows  us  to  handle  self¬ 


occlusions  while  merging  information  from  sev¬ 
eral  viewpoints,  thereby  allowing  us  to  elimi¬ 
nate  blindspots  and  making  the  reconstruction 
more  robust  where  more  than  one  view  is  avail¬ 
able.  The  reconstruction  process  relies  on  both 
monocular  shading  cues  and  stereoscopic  cues. 
We  use  these  cues  to  drive  an  optimization  pro¬ 
cedure  that  takes  advantage  of  their  respective 
strengths  while  eliminating  some  of  their  weak¬ 
nesses. 

Specifically,  stereo  information  is  very  ro¬ 
bust  in  textured  regions  but  potentially  unre¬ 
liable  elsewhere.  We  therefore  use  it  mainly  in 
such  areas  by  weighting  the  stereo  component 
most  strongly  for  facets  of  the  triangulation  that 
project  into  textured  image  areas.  The  compo¬ 
nent  compares  the  grey-levels  of  the  points  in 
all  of  the  images  for  which  the  projection  of  a 
given  point  on  the  surface  is  visible,  as  deter¬ 
mined  using  a  hidden-surface  algorithm.  This 
comparison  is  done  for  a  uniform  sampling  of 


Figure  12:  Results  for  the  second  triplet  of  Figure  10  presented  in  the  same  fashion  as  in  Figure  11. 


the  surface.  This  method  allows  us  to  deal  with 
arbitrarily  slanted  regions  and  to  discount  oc¬ 
cluded  areas  of  the  surface. 

On  the  other  hand,  shading  information  is 
mostly  helpful  in  textureless  areas.  Thus,  we 
weight  the  shading  component  most  strongly  for 
facets  that  project  into  such  areas.  The  com¬ 
ponent  uses  a  new  method  for  utilizing  shad¬ 
ing  information  that  does  not  need  the  tra¬ 
ditional  assumption  of  constant  albedo.  In¬ 
stead,  it  attempts  to  minimize  the  variation  in 
albedo  across  the  surface,  and  can  therefore  deal 
with  both  constant  albedo  surfaces  and  surfaces 
whose  albedo  varies  slowly.  However,  it  does  re¬ 
quire  the  boundary  conditions  that  are  provided 
by  the  stereo  information. 

We  have  developed  a  weighting  scheme  that 
allows  our  system  to  use  each  source  of  informa¬ 
tion  where  it  is  most  appropriate.  As  a  result, 
for  the  large  class  of  surfaces  that  roughly  sat¬ 
isfy  the  Lambertian  model,  it  performs  signih- 
cantly  better  than  if  it  were  using  either  source 
of  information  alone. 

Our  surface  model  can  be  naturally  aug¬ 


mented  to  include  specularities,  shadows  and 
self-shadows.  It  can  also  support  more  complex 
topologies,  multiple  resolutions  and  the  shrink¬ 
ing  or  growing  of  the  surface  of  interest,  though 
in  this  paper  we  concentrated  on  a  better  under¬ 
standing  of  the  behavior  of  the  objective  func¬ 
tion.  These  extensions  will  be  the  subject  of 
future  work. 

Acknowledgments 

We  wish  to  thank  Herve  Mattbieu  and  Olivier 
Monga  who  have  provided  us  with  the  face  images 
and  corresponding  calibration  data  that  appear  in 
this  paper  that  have  proved  extremely  valuable  to 
our  research  effort.  We  would  also  like  to  apologize 
to  the  members  of  the  INRIA  ROBOTVIS  project 
whose  faces  we  have  mercilessly  deformed  during  the 
development  of  the  algorithms  discussed  above. 

References 

[1]  A.  L.  Abbot  and  N.  Ahuja.  Active  surface 
reconstruction  by  integrating  focus,  vergence, 


Figure  13:  Results  for  the  third  triplet  of  Figure  10  presented  in  the  same  fashion  as  in  Figure  11. 


Figure  14:  Values  of  the  stereo  (a)  and  shading  (b)  components  of  the  objective  function  for  the  face  images. 
The  y  axis  represents  the  value  of  the  components  and  the  x  axis  the  various  stages  of  the  optimization. 
From  left  to  right,  wr  first  use  only  stereo  and  decrease  the  smoothness  and,  ^he  right  of  the  thick  dotted 
line,  we  turn  on  the  shading  term.  Each  curve  is  labeled  with  the  number  of  responding  image  triplet 

and  all  values  have  been  scaled  so  that  the  initial  ones  are  equal  to  1.0. 

stereo,  and  camera  calibration.  In /CCV,  pages  [3]  M.  Asada,  M.  Kimura,  Y.  Taniguchi,  and 
480-492,  1990.  Y.  Shirai.  Dynamic  integration  of  height  maps 

into  a  3d  world  representation  from  range  im- 
[2]  J.  Y.  Aloimonos.  Unification  and  integration  age  sequences.  IJCV,  9(l):31-54,  October 

of  visual  modules:  an  extension  of  the  mur  1992. 

paradigm.  In  lUW,  pages  507-551,  1989. 


[4]  E.  P.  BaltMviu.  MuHipkoio  Geomcirietllp 
CotutnineJ  M*ieki*g.  PhD  thesis,  Institute 
for  Geodesy  and  Photgrammetry,  ETH  Zurich, 
December  1991. 

[5]  S.  Barnard.  Stochastic  stereo  matching  over 
scale.  Int’l  J.  Computer  Vieion,  3(1):  17-32, 
1989. 

[6]  H.  G.  Barrow  and  J.  M.  Tenenhaum.  Recover¬ 
ing  intrinsic  scene  characteristics  from  images. 
In  Computer  Vihou  Spatenu,  pages  ^^-26.  Aca¬ 
demic  Press,  New  York,  New  York,  1978. 

[7]  A.  Blake,  A.  Zisaerman,  and  G.  Knowles.  Sur¬ 
face  descriptions  from  stereo  and  shading.  Im- 
ufe  Vision  Comput.,  3(4):183-191, 1985. 

[8]  Y.  Choe  and  R.  L.  Kashy^.  3-d  shape  from 
a  shaded  and  textural  surface  image.  T-PAMI, 
13:907-919, 1991. 

[9]  I.  G}hen,  L.  D.  Cohen,  and  N.  Ayache.  Intro¬ 
ducing  new  deformable  surfaces  to  segment  3d 
images.  In  CVPR,  pages  738-739, 1991. 

[10]  J.  E.  Cryer,  Ping-Sing  Tsai,  and  Mubarak 
Shah.  Combining  shape  from  shading  and 
stereo  using  human  vision  model.  Technical  Re¬ 
port  CS-TR' 92-25,  U.  Central  Florida,  1992. 

[11]  H.  Delingette,  M.  Hebert,  and  K.  Ikeudii 
Sh^>e  representation  and  image  segmentac 
using  deformable  surfaces.  In  CVPR,  pages 
467-472, 1991. 

[12]  B.  Diehl  and  C.  Heipke.  Surface  reconstruction 
from  data  of  digital  line  cameras  by  means  of 
object  based  image  matching.  In  ISPRS,  pages 
287-294,  Washington  D.C.,  1992. 

[13]  O.D.  Faugeras  and  G.  Toscani.  The  Calibra¬ 
tion  Problem  for  Stereo.  In  Proceedings  of 
CVPR86, Miami  Beach, Florida,  pages  15-20, 
1986. 

[14]  Frank  P.  Ferrie,  Jean  Lagarde,  and  Peter 
Whaite.  Recovery  of  volumetric  object  descrip¬ 
tions  from  laser  rangefinder  images.  In  Euro¬ 
pean  Conference  on  Computer  Yiston,  Genoa, 
Italy,  April  1992. 

[15]  P.  Fua.  A  parallel  stereo  algorithm  that  pro¬ 
duces  dense  depth  maps  and  preserves  image 
features.  Machine  Vision  and  Applications, 
1993.  In  print,  available  as  INRIA  research  re¬ 
port  1369. 


[16]  P.  FUa  and  Y.  G.  Leclerc.  Model  driven  edge 
detection.  Machine  Vision  and  Applications, 
3:45-56,  1990. 

[17]  P.  Fua  and  P.  Sander.  Reconstructing  surfaces 
from  unstructured  3d  points.  In  Proceedings  of 
the  i992  DARPA  Image  Understanding  Work¬ 
shop,  San  Di^o,  California,  January  1992. 

[18]  P.  Fiia  and  P.  Sander.  Reconstructing  surfaces 
from  unstructured  3d  points.  In  Image  Un- 
derstanding  Workshop,  San  Diego,  California, 
January  1992. 

[19]  P.  V.  Fua.  Combining  stereo  and  monocular 
information  to  compute  dense  depth  maps  that 
preserve  depth  discontinuities.  In  Proceedings 
of  IJCAI,  Sydney,  Australia,  August  1991. 

[20]  W.  E.  L.  Grimson  and  D.  P.  Huttenlocher.  In¬ 
troduction  to  the  special  issue  on  interpretation 
of  3-d  scenes.  T-PAMI,  14(2):97-98,  February 
1992. 

[21]  K.  Hartt  and  M.  Carlotto.  A  method  for  shape- 
froro-shading  using  multiple  images  acquired 
under  different  viewing  and  lighting  conditions. 
In  CVPR,  pages  53-60, 1989. 

[22]  C.  Heipke.  Integration  of  digital  image  match¬ 
ing  and  multi  image  shape  from  shading.  In  IS¬ 
PRS,  pages  832-841,  Washington  D.C.,  1992. 

[23]  W.  Hoff  and  N.  Ahqja.  Surfaces  from  stereo: 
integrating  feature  matching,  disparity  estima¬ 
tion,  and  contour  detection.  T-PAMI,  11:121- 
136, 1989. 

[24]  B.  K.  P.  Horn.  Height  and  gradient  from 
shading.  Int’l  J.  Computer  Vision,  5(l):37-75, 
1990. 

[25]  Y.  Hung,  D.  B.  Cooper,  and  B.  Cernuschi- 
Frias.  Asymptotic  bayesian  surface  estimation 
using  an  image  sequence.  IJCV,  6(2):105-132, 
June  1991. 

[26]  B.  Kaiser,  M.  Schmolla,  and  B.  P.  Wrobel.  Ap¬ 
plication  of  image  pyramid  for  surface  recon¬ 
struction  with  fast  vision.  In  ISPRS,  page  1, 
Washington,  D.C.,  1992. 

[27]  M.  Kass,  A.  Witkin,  and  D.  Terzopoulos. 
Snakes:  Active  contour  models.  Interna¬ 
tional  Journal  of  Computer  Ytston,  1(4):321- 
331,  1988. 

[28]  Y.  G.  Leclerc.  Constructing  simple  stable 
descriptions  for  image  partitioning.  Interna¬ 
tional  Journal  of  Computer  Vision,  3(1):73- 
102,  1989. 


[29]  Y.  G.  Lederc.  The  Local  Sirnciun  of  Imoge 
Intenntji  Diaeoniinmities.  PhD  thesis,  McGill 
University,  Montr^,  Quebec,  Canada,  May 
1989. 

[30]  Y.  G.  Lederc  and  A.  F.  Bobick.  The  direct 
computation  of  height  from  shading.  In  Pro- 
ceedinfs  of  ike  1991  Computer  Soctety  Comfer- 
euce  on  Computer  Yistou  and  Pattern  Reeo§- 
utttou,  Lahmna,  Maui,  Hawaii,  June  1991. 

[31]  Y.  G.  Lederc  and  A.  F.  Bobick.  The  di¬ 
rect  computation  of  height  from  shading.  In 
Proceediufs  of  ike  1991  DARPA  Image  Un~ 
ierstamiing  Workshop,  San  Diego,  California, 
January  1992. 

[32]  C.  E.  Liedtke,  H.  Busch,  and  R.  Koch.  Shape 
adi4>tation  for  modelling  of  3d  objects  in  nat¬ 
ural  scenes.  In  CVPR,  pages  704-705, 1991. 

[33]  D.  G.  Lowe.  Fitting  parameterised  three- 
dimensional  models  to  images.  T-PAMI, 
13(441-450),  1991. 

[34]  D.  G.  Luenberger.  Linear  and  Nonlinear  Pro¬ 
gramming.  Addison-Wesley,  Menlo  Park,  Cali¬ 
fornia,  second  edition,  1984. 

[35]  D.  Marr.  Ytstou.  W.  H.  Freeman,  San  Fran¬ 
cisco,  California,  1982. 

[36]  A.  Pentland.  Automatic  extraction  of  de¬ 
formable  part  models.  Iniemational  Journal  of 
Computer  Vision,  4(2):107-126,  March  1990. 

[37]  A.  Pentland  and  S.  Sdaroff.  Cloaed-form  solu¬ 
tions  for  physically  based  shi4>e  modeling  and 
recc^nition.  T-PAMI,  13:715-729, 1991. 

[38]  W.  H.  Press,  B.  P.  Flannery,  S.  A.  Teukol- 
sky,  and  W.  T.  Vetterling.  Numerical  recipes, 
the  art  of  scientific  computing.  Cambridge  U. 
Press,  Cambridge,  MA,  1986. 

[39]  L.  H.  Quam.  Hierarchical  warp  stereo.  In 
Proceedings  of  the  1984  DARPA  Image  Under¬ 
standing  Workshop,  pages  149-155,  1984. 

[40]  E.  M.  Stokely  and  S.  Y.  Wu.  Surface  parame¬ 
terization  and  curvature  measurement  of  arbi¬ 
trary  3-d  objects;  five  practical  methods.  T- 
PAMI,  14(8):833-839,  August  1992. 

[41]  R.  Szeliski.  Shape  from  rotation.  In  CVPR, 
pages  625-630, 1991. 

[42]  R.  Szeliski  and  D.  Tonnesen.  Surface  model¬ 
ing  with  oriented  particle  systems.  In  Com¬ 
puter  Graphics  (SIGGRAPH'92),  pages  185- 
194,  July  1992. 


[43]  D.  Terzopoulos.  Regularization  of  inverse  vi¬ 
sual  problems  involving  discontinuities.  IEEE 
TVunsaciions  on  Pattern  Anaigsis  and  Machine 
Intelligence,  8:413-424,  1986. 

[44]  D.  Terzopoulos.  The  computation  of  visible- 
surface  representations.  T-PAMI,  pages  417- 
438,  1988. 

[45]  D.  Terzopoulos  and  D.  Metaxas.  Dynamic  3d 
models  with  local  and  global  deformations:  De¬ 
formable  superquadrics.  T-PAMI,  13(703-714), 
1991. 

[46]  D.  Terzopoulos  and  M.  Vanlescu.  Sampling 
and  reconstruction  with  ad^tive  meshes.  In 
CVPR,  pages  70-75, 1991. 

[47]  D.  Terzopoulos,  A.  Witkin,  and  M.  Kass. 
Symmetry-seeking  modek  and  3d  object  recon¬ 
struction.  IJCV,  1:211-221, 1987. 

[48]  C.  Tomasi  and  T.  Kanade.  The  factorization 
method  for  the  recovery  of  shape  and  moti<Hi 
from  image  streams.  In  Proceedings  of  the 
1992  DARPA  Image  Understanding  Workshop, 
pages  459-472.  DARPA,  January  1992. 

[49]  B.  C.  Vemuri  and  R.  Malladi.  Deformable  mod¬ 
els:  Canonical  parameters  for  surface  represen¬ 
tation  and  multiple  view  integration.  In  CVPR, 
pages  724-725, 1991. 

[50]  Y.  F.  Wang  and  J.  F.  Wang.  Surface  recon¬ 
struction  using  deformable  models  with  interior 
and  boundary  constraints.  T-PAMI,  14(5):572- 
579,  May  1992. 

[51]  P.  Whaite  and  F.  P.  Ferrie.  Ftom  uncertainty 
to  visual  exploration.  T-PAMI,  13(1038-1049), 
1991. 

[52]  A.  W.  Witkin,  D.  Terzopoulos,  and  M.  Kass. 
Signal  matching  through  scale  space.  Interna¬ 
tional  Journal  of  Computer  Vision,  1:133-144, 
1987. 

[53]  B.  P  Wrobel.  The  evolution  of  digital 
photgrammetry  from  analytical  phogramme- 
try.  Phoiogrammetric  Record,  13(77):765-776, 
April  1991. 


